shrslib package
shrslib.basicfunc module
Copyright (c) 2022 Masayuki TAKAHASHI
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
- shrslib.basicfunc.calculate_Tm_value(sequence, conc=0.5, Na_conc=0.05, mode='min')
Calculate the Tm value.
- Parameters
sequence (str, nucleotide_sequence, list, tuple or numpy.ndarray) – The sequence for which you would like to know the Tm value, or its list, tuple, or ndarray. (required)
conc (float) – The concentration of a primer when PCR amplification is performed. (default: 0.5 picomole/microliter)
Na_conc (float) – The concentration of sodium when PCR amplification is performed. (default: 0.05 mole/liter)
mode (str) – Choose a method from ‘min’, ‘max’, ‘average’, or ‘’. (default: min)
- Returns
Tm_value
- Return type
numpy.float or list
- shrslib.basicfunc.complementary_sequence(sequence)
Convert an input sequence to a complementary sequence.
- Parameters
sequence (str, nucleotide_sequence, list, tuple) –
- Returns
New_sequences – The complementary sequence of the sequence in the object.
- Return type
str
- shrslib.basicfunc.read_sequence_file(input, format='fasta', Feature=False)
Read a sequence file. FASTA, Multi-FASTA, and GenBank sequence format files are accepted.
- Parameters
input (str) – File path. (required)
format (str) – ‘fasta’ or ‘genbank’ (default: fasta)
Feature (bool) – The annotations are extracted if the GenBank format is input when True is specified. (default: False)
- Returns
sequences – A list will be returned as follows. [{‘sequence name’:sequence <nucleotide_sequence>}, input sequence number, average length of input sequence]. In cases in which True is specified in a Feature argument, [{‘sequence name’:[sequence <nucleotide_sequence>, feature]}, input sequence number, average length of input sequence].
- Return type
list
- class shrslib.basicfunc.nucleotide_sequence(sequence)
Bases:
strNucleotide sequence definition.
- string
Input character
- Type
str
- description
Nuclotide sequence.
- Type
str
- sequence_length
Nucleotide sequence length (nt).
- Type
int
- Decompress()
Generate the list of nucleotide sequences converted from a wobble base pair to ATGC from a sequence that has a wobble base pair.
- Parameters
None –
- Returns
sequences
- Return type
list
- Nucleotide_composition()
Calculate the nucleotide composition.
- PCR_amplicon(forward, reverse, allowance=0, Single_amplicon=True, Sequence_Only=True, amplicon_size_limit=10000, Warning_ignore=False, circularDNAtemplate=False)
In silico PCR.
- Parameters
forward (str or nucleotide_sequence) – Forward primer sequence (required).
reverse (str or nucleotide_sequence) – Reverse primer sequence (required).
allowance (int) – The acceptable mismatch number. (default: 0)
Single_amplicon (bool) – All amplicons that will be amplified by an input primer set are outputed as list when False is specified. (default: True)
Sequence_Only (bool) – The start and end positions of an amplicon in template sequence are outputed with the amplicon sequence when False is specified. (default: True)
Warning_ignore (bool) – Show all warnings if specify True. (default: False)
circularDNAtemplate (bool) – Specify True if the input sequence is circular DNA. (default: False)
- Returns
PCR_amplicon
- Return type
str or list
- calculate_Tm_value(conc=0.5, Na_conc=0.05, mode='min')
Calculate the Tm value.
- Parameters
conc (float) – The concentration of a primer when PCR amplification is performed. (default: 0.5 picomole/microliter)
Na_conc (float) – The concentration of sodium when PCR amplification is performed. (default: 0.05 mole/liter)
mode (str) – Choose a method from ‘min’, ‘max’, ‘average’, or ‘’. (default: min)
- Returns
Tm_value
- Return type
numpy.float or list
- calculate_flexibility(detailed=False)
Calculate the score for degeneracy.
- Parameters
detailed (bool) – The score when specifying True will be more accurate. (default: False)
- Returns
Flexibility – The higher the score is, the more wobble base pairs the sequence has.
- Return type
float
- complementary_sequence()
Convert an input sequence to a complementary sequence.
- Parameters
None –
- Returns
New_sequences – The complementary sequence of the sequence in the object.
- Return type
str
- search_position(evaluating_sequence, allowance=0, interval_distance=0, Match_rate=0.0, circularDNAtemplate=False)
Search an annealing site position .
- Parameters
evaluating_sequence (str or nucleotide_sequence) – The sequence for which you want to identify the annealing site position in a template sequence.
- Returns
Start_position – The annealing site position between an evaluated sequence and a template (self) sequence. The absolute value of a position number indicates the onset point of the annealing site of a template sequence. e.g., If the “evaluating_sequence” is “ATGC”, {“ATGC”: [10, −35]} will be obtained.
Template : ttggaatgagATGCtgtgaacagtcgtatatacgcGCATcgagattacgctattcgcgcggcg
- Return type
dict
shrslib.explore module
Copyright (c) 2022 Masayuki TAKAHASHI
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
- shrslib.explore.PCR_amplicon(forward, reverse, template, Single_amplicon=True, Sequence_Only=True, amplicon_size_limit=10000, allowance=0, Warning_ignore=False, circularDNAtemplate=False)
In silico PCR.
- Parameters
forward (str or nucleotide_sequence) – Forward primer sequence (required).
reverse (str or nucleotide_sequence) – Reverse primer sequence (required).
template (str or nucleotide_sequence) – Template DNA sequence (required).
Single_amplicon (bool) – All amplicons that will be amplified by an input primer set are outputed as list when False is specified. (default: True)
Sequence_Only (bool) – The start and end positions of an amplicon in template sequence are outputed with the amplicon sequence when False is specified. (default: True)
amplicon_size_limit (int) – The upper limit of amplicon size. (default: 10,000)
allowance (int) – The acceptable mismatch number. (default: 0)
Warning_ignore (bool) – Show all warnings if True is specified. (default: False)
circularDNAtemplate (bool) – Specify True if the input sequence is circular DNA. (default: False)
- Returns
PCR_amplicon
- Return type
str or list
- shrslib.explore.search_position(Combination=[], evaluating_sequence='', input_sequence='', allowance=0, interval_distance=0, Position_list=array([], dtype=float64), Match_rate=0.8, Maximum_annealing_site_number=10, circularDNAtemplate=False)
Search an annealing site position .
- Parameters
Combination (list) –
evaluating_sequence (str ot nucleotide_sequence) –
input_sequence (str ot nucleotide_sequence) – Evaluating sequence is the sequence for which you want to identify the annealing site position in a input sequence.
- Returns
Start_position – The annealing site position between an evaluated sequence and a input sequence. The absolute value of a position number indicates the onset point of the annealing site of a input sequence. e.g., If the “evaluating_sequence” is “ATGC” and the “input_sequence” is following sequence, {“ATGC”: [10, −35]} will be obtained.
Template : ttggaatgagATGCtgtgaacagtcgtatatacgcGCATcgagattacgctattcgcgcggcg
- Return type
dict
shrslib.scores module
Copyright (c) 2022 Masayuki TAKAHASHI
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
- shrslib.scores.array_diff(arr1, arr2, lower, upper, logical=False, separation_key=[])
Calculate the difference between two arrays.
- Parameters
arr1 (int, float, list, tuple, numpy.ndarray) –
arr2 (int, float, list, tuple, numpy.ndarray) –
lower (int, float) –
upper (int, float) –
logical (bool) –
separation_key (int, float, list of int, list of float) –
- Returns
Result – A list containing the subtract value is returned at default settings. When the logical is True, it returns a logical value.
- Return type
list
Notes
Subtract arr1 from arr2. Subtraction across Separation_key is NOT performed. For example, arr1 is [1, 1000, 10000], arr2 is [50, 1100, 10150], lower is 40 and upper is 400. The answer [49, 100, 150] will be obtained when the Separation key is the default value (default: −1). However, when the Separation key is 1050, [49, 150] will be obtained.
- shrslib.scores.calculate_diff_length_score(sequence_pair_set, Reverse=False, pair_logical=False, return_sequence_name=False)
Calculate similarity between two different length sequences. The higher the value is, the more similar the two sequences are.
- Parameters
sequence_pair_set (tuple of two sequences, list of two sequences or list of them) –
Reverse (bool) –
pair_logical (bool) – Return True if the score is 1, when ‘pair_logical’ is True.
return_sequence_name (bool) – Return two input sequences and its score when ‘return_sequence_name’ is True.
- Returns
Similarity_score
- Return type
float
- shrslib.scores.calculate_flexibility(sequence, detailed=False)
Calculate the score for degeneracy.
- Parameters
sequence (str, list or tuple) –
detailed (bool) – The score when specifying True will be more accurate. (default: False)
- Returns
Flexibility – The higher the score is, the more wobble base pairs the sequence has.
- Return type
float
- shrslib.scores.calculate_score(seq1, seq2)
Calculate the similarity between two sequences. The higher the value is, the more similar the two sequences are.
- Parameters
seq1 (str or nucleotide_sequence) –
seq2 (str or nucleotide_sequence) –
- Returns
Similarity_score
- Return type
float
- shrslib.scores.fragment_size_distance(array, sum=False)
Function for calculating the distance based on values in array.
- Parameters
array (list, tuple or numpy.ndarray) –
sum (bool) –
- Returns
distance
- Return type
float
- shrslib.scores.sequence_duplicated(sequence_list, keep='first', local=False, complementary=False)
Extract duplicated sequences in list, tuple, numpy.ndarray or pandas.Series.
- Parameters
sequence_list (list, tuple, numpy.ndarray or pandas.Series) –
keep (str) – Choose ‘first’, ‘last’, ‘both’, or ‘null’. ‘first’ means that the first sequence in duplicates remains, and the others in its duplicates are discarded.
local (bool) – Judge whether the sequence is duplicated or not based on a partial sequence when the local is ‘True’.
complementary (bool) –
- Returns
Result – Return True when the sequence is duplicated.
- Return type
list