Edit line names with a new name containing an incremented value

Question

Edit line names with a new name containing an incremented value

151 views Asked by brandon scott At 10 June 2015 at 19:36

This seems like a simple task to me but getting it to work easily is ending up more difficult than I thought:

I have a fasta file containing several million lines of text (only a few hundred individual sequence entries) and these sequence names are long, I want to replace all characters after the header > with Contig $n, where $n is an integer starting at 1 and is incremented for each replacement.

an example input sequence name:

>NODE:345643RD:Cov_456:GC47:34thgd
ATGTCGATGCGT
>NODE...
ATGCGCTTACAC

Which I then want to output like this

>Contig 1
ATGTCGATGCGT
>Contig 2
ATGCGCTTACAC

so maybe a Perl script? I know some basics but I'd like to read in a file and then output the new file with the changes, and I'm unsure of the best way to do this? I've seen some Perl one liner examples but none did what I wanted.

$n = 1

if { 

    s/>.*/(Contig)++$n/e

    ++$n
}

Original Q&A

There are 5 answers

**shivams** · Answer 1 · 2015-06-10T19:47:40+00:00

Try something like this:

#!/usr/bin/perl -w

use strict;

open (my $fh, '<','example.txt');
open (my $fh1, '>','example2.txt');

my $n = 1;

# For each line of the input file
while(<$fh>) {

    # Try to update the name, if successful, increment $n
    if ($_ =~ s/^>.*/>Contig$n/) { $n++; }

    print $fh1 $_;
}

**josifoski** · Answer 2 · 2015-06-10T20:00:48+00:00

josifoski On 10 June 2015 at 20:00

I'm not awk expert (far from that), but solved this only for curiosity and because sed don't contain variables (limited possibilities).

One possible gawk solution could be

awk -v n=1 '/^>/{print ">Contig " n; n++; next}1' <file

**stevieb** · Answer 3 · 2015-06-10T20:04:12+00:00

stevieb On 10 June 2015 at 20:04

perl -i -pe 's/>.*/">Contig " . ++$c/e;' file.txt

Output:

\>Contig 1
ATGTCGATGCGT
\>Contig 2
ATGCGCTTACAC

**Ed Morton** · Answer 4 · 2015-06-10T20:06:20+00:00

Ed Morton On 10 June 2015 at 20:06

$ awk '/^\\>/{$0="\\>Contig "++n} 1' file
\>Contig 1

ATGTCGATGCGT

\>Contig 2

ATGCGCTTACAC

**mob** · Answer 5 · 2015-06-10T20:07:52+00:00

mob On 10 June 2015 at 20:07

When you use the /e modifier, Perl expects the substitution pattern to be a valid Perl expression. Try something like

s/>.*/">Contig " . ++$n/e

TechQA.

Edit line names with a new name containing an incremented value

There are 5 answers

Related Questions in REGEX

Related Questions in PERL

Related Questions in AWK

Related Questions in SED

Related Questions in FASTA

Popular Questions

Popular Tags

Trending Questions