This seems like a simple task to me but getting it to work easily is ending up more difficult than I thought:
I have a fasta file containing several million lines of text (only a few hundred individual sequence entries) and these sequence names are long, I want to replace all characters after the header >
with Contig $n
, where $n
is an integer starting at 1 and is incremented for each replacement.
an example input sequence name:
>NODE:345643RD:Cov_456:GC47:34thgd
ATGTCGATGCGT
>NODE...
ATGCGCTTACAC
Which I then want to output like this
>Contig 1
ATGTCGATGCGT
>Contig 2
ATGCGCTTACAC
so maybe a Perl script? I know some basics but I'd like to read in a file and then output the new file with the changes, and I'm unsure of the best way to do this? I've seen some Perl one liner examples but none did what I wanted.
$n = 1
if {
s/>.*/(Contig)++$n/e
++$n
}
Try something like this: