if I have 3 DNA sequences and I want to evaluate them by some functions:
seq1='AG_CT'
seq2='AG_CT'
seq3='ACT_T'
How can I calculate the consensus score and the weighted sum of pairs score (WSP score) of this three DNA sequences in python?
consensus score is the sum of the pairwise score between sequences and the consensus sequence, Consensus (A)=sum{l}^{i=1}d(i) l is the lenght of sequence , d is the distance between two bases, example: d(A,B)=2 for A!=B, d(A,-)=d(-,A)=1 for A!='-',0 else. A and B may be 'A or C or G or T ' for the above example
we calculate distance between seq1 and seq2 then seq1 and seq3 then seq2 and seq3
**seq1 and seq2:**
d(A,A)=0, d(G,G)=0, d(-,-)=0, d(c,c)=0, d(t,t)=0
**seq1 and seq3**:
d(A,A)=0, d(G,C)=2, d(-,T)=1, d(c,-)=1, d(t,t)=0
**seq2 and seq3**:
d(A,A)=0, d(G,C)=2, d(-,T)=1, d(c,-)=1, d(t,t)=0
seq1= A G _ C T
seq2= A G _ C T
seq3= A C T _ T
0 0 0 0 0
0 2 1 1 0
0 2 1 1 0
++++++++++++++
0+ 4+ 2+ 2+ 0= 8
consensus(A)=8
weighted sum of pairs WSP (A) = \sum_{i=1}^{k-1} \sum_{j=i+l}^k \sum_{h=1}^l wij* s( A[ i,h ], [ j,h ] l :lenght of sequence, k number of sequences , wij weight of sequences i and j
s(A,B)=2 for A!=B, s(A,-)=d(-,A)=-1 for A!='-',3 else.all the weight factors are 1.
seq1= A G _ C T
seq2= A G _ C T
seq3= A C T _ T
3 3 3 3 3
3 2 -1 -1 3
3 2 -1 -1 3
++++++++++++++
(3+3+3)*1+(3+2+2)*1+(3-1-1)*1+(3-1-1)*1+(3+3+3)*1=9*1+7*1+1*1+1*1+9*1
9+7+1+1+9=27
So, WSP score of the three sequences is 27
I would approach this as follows. First, create functions to calculate the individual distances and weighted sums:
and
Second, create the sets of bases at the same positions using the
zip
function:Third, generate the pairs within each position using
itertools.combinations
:Finally, add up the distances and sums:
Note the use of
*pair
to unpack the 2-tuples of base pairs into the two arguments of the calculation functions.