How to replace 00 with Na excluding first row & first column using bash in comma separated file

Question

How to replace 00 with Na excluding first row & first column using bash in comma separated file

401 views Asked by AudioBubble At 29 April 2022 at 07:33

I'm working with GWAS data, My data looks like this:

IID,kgp11004425,rs11274005,kgp183005,rs746410036,kgp7979600
1,00,AG,GT,AK,00
32,AG,GG,AA,00,AT
100,TT,AA,00,AG,AA       
3,GG,AG,00,GT,GG

Desired Output:

IID,kgp11004425,rs11274005,kgp183005,rs746410036,kgp7979600
1,N/A,AG,GT,AK,N/A
32,AG,GG,AA,N/A,AT
100,TT,AA,N/A,AG,AA       
3,GG,AG,N/A,GT,GG

Here I'm trying to replace "00" with "N/A", but since I have 00 in the first row & First Column (IID), the command I used:

sed '1!s~00~N/A~g' allSNIPsFinaldata.csv

The above command excludes the first row but not the first column as a result I got IID Values 100, 200, and 300 as 1N/A, 2N/A, and 3N/A. Can anyone please help "how to exclude the first row & First Column as well and perform the above operation. please help

Original Q&A

There are 6 answers

Egor Lipchinskiy On 29 April 2022 at 07:47

If you want to repolace only 00 in other columns you have to add a delimiter (I am assuming space in my command) in your pattern: sed -i 's~ 00 ~ N/A ~g' allSNIPsFinaldata.csv

tshiono On 29 April 2022 at 08:13

Assuming the columns are separated by space characters such as whitespace or tab character, would you please try:

sed -E '1!s~([[:space:]])00([[:space:]]|$)~\1N/A\2~g' allSNIPsFinaldata.csv

The address 1! skips the 1st row.
The regex ([[:space:]])00([[:space:]]|$) matches the 00 string preceded by a space character (it prevents to match the 1st column) and followed by a space character or the end of the line.

James Brown On 29 April 2022 at 08:27

An awk:

$ awk '
BEGIN {
    FS=OFS=","          # set field delimiters to a comma
}
FNR>1 {                 # process records after the first
    for(i=1;i<=NF;i++)  # iterate all fields (maybe start from 2nd?)
        if($i=="00")    # if field is 00
            $i="N/A"    # replace
}1' file                # output

Output:

IID,kgp11004425,rs11274005,kgp183005,rs746410036,kgp7979600
1,N/A,AG,GT,AK,N/A
32,AG,GG,AA,N/A,AT
100,TT,AA,N/A,AG,AA       
3,GG,AG,N/A,GT,GG

RavinderSingh13 On 29 April 2022 at 10:00

With your shown samples in GNU awk using its gensub function, please try following awk program.

awk '
BEGIN{
  FS=OFS=","
}
FNR==1{
  print
  next
}
{
  secondPart=gensub(/^[^,]*,(.*)/,"\\1","g")
  sub(/^00,/,"N/A,",secondPart)
  gsub(/,00,/,",N/A,",secondPart)
  sub(/,00$/,",N/A",secondPart)
  print $1 OFS secondPart
}
'  Input_file

potong On 29 April 2022 at 11:05

This might work for you (GNU sed):

sed -E '1!{s/,00(,|$)/,N\/A\1/g;s//,N\/A\1/g}' file

If not the first line and , followed by 00 followed by , or end-of-line, replace the 00 by N/A and other parts of the match remain unchanged.

This substitution is global but needs to be implemented twice because the patterns may overlap.

**Ed Morton** · Accepted Answer · 2022-04-29T16:40:05+00:00

Using any awk in any shell on every Unix box:

$ awk '{$0=$0","; gsub(/,00,/,",N/A,"); sub(/,$/,"")} 1' file
IID,kgp11004425,rs11274005,kgp183005,rs746410036,kgp7979600
1,N/A,AG,GT,AK,N/A
32,AG,GG,AA,N/A,AT
100,TT,AA,N/A,AG,AA
3,GG,AG,N/A,GT,GG

The above assumes none of the column names on the first row will be the exact string 00. If they can be then just tweak the above to:

awk 'NR>1{$0=$0","; gsub(/,00,/,",N/A,"); sub(/,$/,"")} 1' file

TechQA.

How to replace 00 with Na excluding first row & first column using bash in comma separated file

There are 6 answers

Related Questions in BASH

Related Questions in UNIX

Related Questions in AWK

Related Questions in SED

Related Questions in GWAS

Popular Questions

Trending Questions