Change Fasta names to identifier values with conditional statement; error due to unequal lengths

24 views Asked by At

I am relatively new to R and want to use it for a population genetics class. I have successfully written a script to pull Fasta files from GenBank via accession numbers, however, the names of the DNA sequences are the accession numbers (not useful for further analysis) and I would like to change them to values in an identifier column. I know this is probably an unusual request and appreciate any help. It seems as though this should be a trivial issue.

I have tried many solutions and think the most reasonable is a conditional statement in which if any of the names (currently undescriptive accession numbers) of the fastas are found in the original data, the identifier column will be pasted as the new name. I have tried to make a reprex of the code here:enter image description here. This throws an error as the values being compared are of different lengths. All I truly want to do is: find the current names match the original values and then paste over the corresponding ID column value as the new names for the Fastas.

Any help is greatly appreciated!

1

There are 1 answers

1
user23562036 On

This issue has been resolved. This may not be the easiest way, but I re-made data frames for each gene with only the ID and accession number columns, read in the fasta, then overwrote the names of the fasta and overwrote the original fasta with the updated names. You will likley need the adegenet and ape packages to do some of these transformations. I will post a screenshot of my raw code as an example for others here: enter image description here