I have been trying to implement the following paper: However 3.4 of this paper refers to doing multiple sequence alignment on multiple text strings using clustalW - something that I'm unsure how to do.
Whatever packages I have found on ClustalW only take ATGC as input strings - the algorithm doesn't seem to be designed for other letters.
- All other algorithms like Needleman Wunsch are only for pairwise alignment
- We cannot do pairwise sequence alignment as this will not give us the global optimum.
- Encoding the text string as ATGC will not work either as we will get a different lengths of each encoding and these encodings could split when we do MSA normally which is not possible since a character cannot split. Does MSA only exist for bioinformatics - has anyone worked on this in normal text. Can someone suggest another method for doing multiple sequence alignment or how I can get around this.