I have a long file with provisional SNP IDs and alleles, which looks like this:
14_611646T,C
14_881226CT,C
14_861416.1GGC,GGCGCGCGCG
I would like to separate the last number in each line from the letters (separate SNP ID from alleles). So to look like this:
14_611646 T,C
14_881226 CT,C
14_861416.1 GGC,GGCGCGCGCG
I tried both awk and sed, however, underscore keeps making the problem. For example:
sed 's/^[0-9][0-9]*/& / File1 > File2
gave me
14 _611646T,C
14 _881226CT,C
14 _861416.1GGC,GGCGCGCGCGC
Can anyone help me?
Try to understand what is the most smart way to achieve this.
It's better to avoid using a regex that match all the line, instead try to find the portion that need change.
Using
sedwith-EakaExtentedRegexExpression :Yields:
The regular expression matches as follows:
^[0-9_.]+In the right part of
sed's substitution,&is what matched in the left part.Bonus
[[:upper:]]is aPOSIXregex class meant for all upper case letters.