shrslib package

shrslib.basicfunc module

Copyright (c) 2022 Masayuki TAKAHASHI

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

shrslib.basicfunc.calculate_Tm_value(sequence, conc=0.5, Na_conc=0.05, mode='min')

Calculate the Tm value.

Parameters
  • sequence (str, nucleotide_sequence, list, tuple or numpy.ndarray) – The sequence for which you would like to know the Tm value, or its list, tuple, or ndarray. (required)

  • conc (float) – The concentration of a primer when PCR amplification is performed. (default: 0.5 picomole/microliter)

  • Na_conc (float) – The concentration of sodium when PCR amplification is performed. (default: 0.05 mole/liter)

  • mode (str) – Choose a method from ‘min’, ‘max’, ‘average’, or ‘’. (default: min)

Returns

Tm_value

Return type

numpy.float or list

shrslib.basicfunc.complementary_sequence(sequence)

Convert an input sequence to a complementary sequence.

Parameters

sequence (str, nucleotide_sequence, list, tuple) –

Returns

New_sequences – The complementary sequence of the sequence in the object.

Return type

str

shrslib.basicfunc.read_sequence_file(input, format='fasta', Feature=False)

Read a sequence file. FASTA, Multi-FASTA, and GenBank sequence format files are accepted.

Parameters
  • input (str) – File path. (required)

  • format (str) – ‘fasta’ or ‘genbank’ (default: fasta)

  • Feature (bool) – The annotations are extracted if the GenBank format is input when True is specified. (default: False)

Returns

sequences – A list will be returned as follows. [{‘sequence name’:sequence <nucleotide_sequence>}, input sequence number, average length of input sequence]. In cases in which True is specified in a Feature argument, [{‘sequence name’:[sequence <nucleotide_sequence>, feature]}, input sequence number, average length of input sequence].

Return type

list

class shrslib.basicfunc.nucleotide_sequence(sequence)

Bases: str

Nucleotide sequence definition.

string

Input character

Type

str

description

Nuclotide sequence.

Type

str

sequence_length

Nucleotide sequence length (nt).

Type

int

Decompress()

Generate the list of nucleotide sequences converted from a wobble base pair to ATGC from a sequence that has a wobble base pair.

Parameters

None

Returns

sequences

Return type

list

Nucleotide_composition()

Calculate the nucleotide composition.

PCR_amplicon(forward, reverse, allowance=0, Single_amplicon=True, Sequence_Only=True, amplicon_size_limit=10000, Warning_ignore=False, circularDNAtemplate=False)

In silico PCR.

Parameters
  • forward (str or nucleotide_sequence) – Forward primer sequence (required).

  • reverse (str or nucleotide_sequence) – Reverse primer sequence (required).

  • allowance (int) – The acceptable mismatch number. (default: 0)

  • Single_amplicon (bool) – All amplicons that will be amplified by an input primer set are outputed as list when False is specified. (default: True)

  • Sequence_Only (bool) – The start and end positions of an amplicon in template sequence are outputed with the amplicon sequence when False is specified. (default: True)

  • Warning_ignore (bool) – Show all warnings if specify True. (default: False)

  • circularDNAtemplate (bool) – Specify True if the input sequence is circular DNA. (default: False)

Returns

PCR_amplicon

Return type

str or list

calculate_Tm_value(conc=0.5, Na_conc=0.05, mode='min')

Calculate the Tm value.

Parameters
  • conc (float) – The concentration of a primer when PCR amplification is performed. (default: 0.5 picomole/microliter)

  • Na_conc (float) – The concentration of sodium when PCR amplification is performed. (default: 0.05 mole/liter)

  • mode (str) – Choose a method from ‘min’, ‘max’, ‘average’, or ‘’. (default: min)

Returns

Tm_value

Return type

numpy.float or list

calculate_flexibility(detailed=False)

Calculate the score for degeneracy.

Parameters

detailed (bool) – The score when specifying True will be more accurate. (default: False)

Returns

Flexibility – The higher the score is, the more wobble base pairs the sequence has.

Return type

float

complementary_sequence()

Convert an input sequence to a complementary sequence.

Parameters

None

Returns

New_sequences – The complementary sequence of the sequence in the object.

Return type

str

search_position(evaluating_sequence, allowance=0, interval_distance=0, Match_rate=0.0, circularDNAtemplate=False)

Search an annealing site position .

Parameters

evaluating_sequence (str or nucleotide_sequence) – The sequence for which you want to identify the annealing site position in a template sequence.

Returns

Start_position – The annealing site position between an evaluated sequence and a template (self) sequence. The absolute value of a position number indicates the onset point of the annealing site of a template sequence. e.g., If the “evaluating_sequence” is “ATGC”, {“ATGC”: [10, −35]} will be obtained.

Template : ttggaatgagATGCtgtgaacagtcgtatatacgcGCATcgagattacgctattcgcgcggcg

Return type

dict

shrslib.explore module

Copyright (c) 2022 Masayuki TAKAHASHI

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

shrslib.explore.PCR_amplicon(forward, reverse, template, Single_amplicon=True, Sequence_Only=True, amplicon_size_limit=10000, allowance=0, Warning_ignore=False, circularDNAtemplate=False)

In silico PCR.

Parameters
  • forward (str or nucleotide_sequence) – Forward primer sequence (required).

  • reverse (str or nucleotide_sequence) – Reverse primer sequence (required).

  • template (str or nucleotide_sequence) – Template DNA sequence (required).

  • Single_amplicon (bool) – All amplicons that will be amplified by an input primer set are outputed as list when False is specified. (default: True)

  • Sequence_Only (bool) – The start and end positions of an amplicon in template sequence are outputed with the amplicon sequence when False is specified. (default: True)

  • amplicon_size_limit (int) – The upper limit of amplicon size. (default: 10,000)

  • allowance (int) – The acceptable mismatch number. (default: 0)

  • Warning_ignore (bool) – Show all warnings if True is specified. (default: False)

  • circularDNAtemplate (bool) – Specify True if the input sequence is circular DNA. (default: False)

Returns

PCR_amplicon

Return type

str or list

shrslib.explore.search_position(Combination=[], evaluating_sequence='', input_sequence='', allowance=0, interval_distance=0, Position_list=array([], dtype=float64), Match_rate=0.8, Maximum_annealing_site_number=10, circularDNAtemplate=False)

Search an annealing site position .

Parameters
  • Combination (list) –

  • evaluating_sequence (str ot nucleotide_sequence) –

  • input_sequence (str ot nucleotide_sequence) – Evaluating sequence is the sequence for which you want to identify the annealing site position in a input sequence.

Returns

Start_position – The annealing site position between an evaluated sequence and a input sequence. The absolute value of a position number indicates the onset point of the annealing site of a input sequence. e.g., If the “evaluating_sequence” is “ATGC” and the “input_sequence” is following sequence, {“ATGC”: [10, −35]} will be obtained.

Template : ttggaatgagATGCtgtgaacagtcgtatatacgcGCATcgagattacgctattcgcgcggcg

Return type

dict

shrslib.scores module

Copyright (c) 2022 Masayuki TAKAHASHI

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

shrslib.scores.array_diff(arr1, arr2, lower, upper, logical=False, separation_key=[])

Calculate the difference between two arrays.

Parameters
  • arr1 (int, float, list, tuple, numpy.ndarray) –

  • arr2 (int, float, list, tuple, numpy.ndarray) –

  • lower (int, float) –

  • upper (int, float) –

  • logical (bool) –

  • separation_key (int, float, list of int, list of float) –

Returns

Result – A list containing the subtract value is returned at default settings. When the logical is True, it returns a logical value.

Return type

list

Notes

Subtract arr1 from arr2. Subtraction across Separation_key is NOT performed. For example, arr1 is [1, 1000, 10000], arr2 is [50, 1100, 10150], lower is 40 and upper is 400. The answer [49, 100, 150] will be obtained when the Separation key is the default value (default: −1). However, when the Separation key is 1050, [49, 150] will be obtained.

shrslib.scores.calculate_diff_length_score(sequence_pair_set, Reverse=False, pair_logical=False, return_sequence_name=False)

Calculate similarity between two different length sequences. The higher the value is, the more similar the two sequences are.

Parameters
  • sequence_pair_set (tuple of two sequences, list of two sequences or list of them) –

  • Reverse (bool) –

  • pair_logical (bool) – Return True if the score is 1, when ‘pair_logical’ is True.

  • return_sequence_name (bool) – Return two input sequences and its score when ‘return_sequence_name’ is True.

Returns

Similarity_score

Return type

float

shrslib.scores.calculate_flexibility(sequence, detailed=False)

Calculate the score for degeneracy.

Parameters
  • sequence (str, list or tuple) –

  • detailed (bool) – The score when specifying True will be more accurate. (default: False)

Returns

Flexibility – The higher the score is, the more wobble base pairs the sequence has.

Return type

float

shrslib.scores.calculate_score(seq1, seq2)

Calculate the similarity between two sequences. The higher the value is, the more similar the two sequences are.

Parameters
Returns

Similarity_score

Return type

float

shrslib.scores.fragment_size_distance(array, sum=False)

Function for calculating the distance based on values in array.

Parameters
  • array (list, tuple or numpy.ndarray) –

  • sum (bool) –

Returns

distance

Return type

float

shrslib.scores.sequence_duplicated(sequence_list, keep='first', local=False, complementary=False)

Extract duplicated sequences in list, tuple, numpy.ndarray or pandas.Series.

Parameters
  • sequence_list (list, tuple, numpy.ndarray or pandas.Series) –

  • keep (str) – Choose ‘first’, ‘last’, ‘both’, or ‘null’. ‘first’ means that the first sequence in duplicates remains, and the others in its duplicates are discarded.

  • local (bool) – Judge whether the sequence is duplicated or not based on a partial sequence when the local is ‘True’.

  • complementary (bool) –

Returns

Result – Return True when the sequence is duplicated.

Return type

list