consensus score and WSP score in python

805 views Asked by At

if I have 3 DNA sequences and I want to evaluate them by some functions:

 seq1='AG_CT'
 seq2='AG_CT'
 seq3='ACT_T'

How can I calculate the consensus score and the weighted sum of pairs score (WSP score) of this three DNA sequences in python?

consensus score is the sum of the pairwise score between sequences and the consensus sequence, Consensus (A)=sum{l}^{i=1}d(i) l is the lenght of sequence , d is the distance between two bases, example: d(A,B)=2 for A!=B, d(A,-)=d(-,A)=1 for A!='-',0 else. A and B may be 'A or C or G or T ' for the above example

     we calculate distance between seq1 and seq2 then seq1 and seq3 then seq2 and seq3

**seq1 and seq2:**
d(A,A)=0, d(G,G)=0, d(-,-)=0, d(c,c)=0, d(t,t)=0
**seq1 and seq3**:
d(A,A)=0, d(G,C)=2, d(-,T)=1, d(c,-)=1, d(t,t)=0
**seq2 and seq3**:
d(A,A)=0, d(G,C)=2, d(-,T)=1, d(c,-)=1, d(t,t)=0


         seq1= A  G  _  C  T
         seq2= A  G  _  C  T
         seq3= A  C  T  _  T
               0  0  0  0  0
               0  2  1  1  0
               0  2  1  1  0
               ++++++++++++++
               0+ 4+ 2+ 2+ 0= 8

consensus(A)=8

weighted sum of pairs WSP (A) = \sum_{i=1}^{k-1} \sum_{j=i+l}^k \sum_{h=1}^l wij* s( A[ i,h ], [ j,h ] l :lenght of sequence, k number of sequences , wij weight of sequences i and j

s(A,B)=2 for A!=B, s(A,-)=d(-,A)=-1 for A!='-',3 else.all the weight factors are 1.

             seq1= A  G  _  C  T
             seq2= A  G  _  C  T
             seq3= A  C  T  _  T
                   3  3  3  3  3
                   3  2 -1 -1  3
                   3  2 -1 -1  3
                   ++++++++++++++
                  (3+3+3)*1+(3+2+2)*1+(3-1-1)*1+(3-1-1)*1+(3+3+3)*1=9*1+7*1+1*1+1*1+9*1
                   9+7+1+1+9=27

So, WSP score of the three sequences is 27

1

There are 1 answers

1
jonrsharpe On BEST ANSWER

I would approach this as follows. First, create functions to calculate the individual distances and weighted sums:

def distance(a, b):
    """Distance between two bases a and b."""
    if a == b:
        return 0
    elif a == "_" or b == "_":
        return 1
    else:
        return 2

and

def w_sum(a, b, w=1):
    """Calculate the pair sum of bases a and b with weighting w."""
    if a == b:
        return 3 * w
    elif a == "_" or b == "_":
        return -1 * w
    else:
        return 2 * w

Second, create the sets of bases at the same positions using the zip function:

list(zip(seq1, seq2, seq3)) == [('A', 'A', 'A'), 
                                ('G', 'G', 'C'), 
                                ('_', '_', 'T'), 
                                ('C', 'C', '_'), 
                                ('T', 'T', 'T')]

Third, generate the pairs within each position using itertools.combinations:

list(combinations(('G', 'G', 'C'), 2)) == [('G', 'G'), 
                                           ('G', 'C'), 
                                           ('G', 'C')]

Finally, add up the distances and sums:

from itertools import combinations

consensus = 0
wsp = 0
for position in zip(seq1, seq2, seq3): # sets at same position
    for pair in combinations(position, 2): # pairs within set
        consensus+= distance(*pair) # calculate distance
        wsp += w_sum(*pair) # calculate pair sum

Note the use of *pair to unpack the 2-tuples of base pairs into the two arguments of the calculation functions.