I have a fasta file with several sequences, but the first line of all the sequences start with the same string (ABI) and I want to change and replace it with the names of the species stored in a different text file.
My fasta file looks like
>ABI
AGCTAGTCCCGGGTTTATCGGCTATAC
>ABI
ACCCCTTGACTGACATGGTACGATGAC
>ABI
ATTTCGACTGGTGTCGATAGGCAGCAT
>ABI
ACGTGGCTGACATGTATGTAGCGATGA
The list of spp looks like this:
Alsophila cuspidata
Bunchosia argentea
Miconia cf.gracilis
Meliosma frondosa
How I can change those ABI headers of my sequences and replace them with the name of my species using that exact order.
Required output:
>Alsophila cuspidata
AGCTAGTCCCGGGTTTATCGGCTATAC
>Bunchosia argentea
ACCCCTTGACTGACATGGTACGATGAC
>Miconia cf.gracilis
ATTTCGACTGGTGTCGATAGGCAGCAT
>Meliosma frondosa
ACGTGGCTGACATGTATGTAGCGATGA
I was using something like:
awk '
FNR==NR{
a[$1]=$2
next
}
($2 in a) && /^>/{
print ">"a[$2]
next
}
1
' spp_list.txt FS="[> ]" all_spp.fasta
This is not working, could someone guide me please.
Hello, not a dev so don't be rude.
Hope this will help you:
I create a file fasta.txt that contains:
I also created a file spplist.txt that contains:
I then created a python script named fasta.py, here it is:
(these three file need to be in the same directory if you want the script to work as it is)
Here is my directoy tree:
To execute the script, open a shell, cd in the directory and type:
You will see a new file named output.txt in the directory:
and here is its content:
Hope this can help you out. bguess.