How to do do multiple sequence alignment for text strings

48 views Asked by At

I have been trying to implement the following paper: However 3.4 of this paper refers to doing multiple sequence alignment on multiple text strings using clustalW - something that I'm unsure how to do.

Whatever packages I have found on ClustalW only take ATGC as input strings - the algorithm doesn't seem to be designed for other letters.

  1. All other algorithms like Needleman Wunsch are only for pairwise alignment
  2. We cannot do pairwise sequence alignment as this will not give us the global optimum.
  3. Encoding the text string as ATGC will not work either as we will get a different lengths of each encoding and these encodings could split when we do MSA normally which is not possible since a character cannot split. Does MSA only exist for bioinformatics - has anyone worked on this in normal text. Can someone suggest another method for doing multiple sequence alignment or how I can get around this.
0

There are 0 answers