How can I modify the Smith-Waterman algorithm using a substitution matrix to align proteins in Perl?
[citations needed]
How can I modify the Smith-Waterman algorithm using a substitution matrix to align proteins in Perl?
[citations needed]
I'm actually a bioinformatics researcher, and one that is waiting for his own bioinformatics code to run, so I'll attempt to answer your question even though it's rather poorly posed.
I'm not sure why you think you need to "modify" the Smith-Waterman algorithm. The only thing the Smith-Waterman algorithm needs to align proteins instead of DNA is a substitution matrix for proteins. Look into BLOSUM or PAM. These are based on the substitution frequencies of various amino acid pairs in sequences hand-aligned by some biologists a long time ago.
Constructing a substitution matrix for protein sequences is much more complicated than for DNA sequences. For example, you'd expect one hydrophilic amino acid to substitute for another relatively frequently because it would often be able to do so w/o causing the protein to lose function. However, you wouldn't expect a hydrophobic amino acid to substitute for a hydrophilic amino acid as often because this would change the protein structure more drastically.
If you view the substitution matrix as an input instead of part of the algorithm, the Smith-Waterman algorithm, while typically applied to DNA or proteins, is technically a general string alignment algorithm.
Maybe start with Bio::Tools::pSW, try to modify it the way you want and ask specific questions if you run in to difficulty.