Using multiple conditions in awk

6.5k views Asked by At

I want to extract the information from a large file based on multiple conditions (from the same file) as well as pattern searching from other small file, Following is the script I used:

awk 'BEGIN{FS=OFS="\t"}NR==FNR{a[$0]++;next}$1 in a {print $2,$4,$5}' file2.txt file1.txt >output.txt

Now, I want to use the condition in the same awk script that ONLY print the line where the element of 4th column (any one character amongst the ATGC) matches the element of 5th column (any one character amongst the ATGC); both the column is in file 1.

Hence, in a way, I want to merge the following script with the script mentioned above:

awk '$4 " "==$5{print $2,$4,$5}' file1.txt

Following is the representation of file1.txt:

SNP Name    Sample ID   GC Score    Allele1 - Forward   Allele2 - Forward
ARS-BFGL-BAC-10172  834269752   0.9374  A   G
ARS-BFGL-BAC-1020   834269752   0.9568  A   A
ARS-BFGL-BAC-10245  834269752   0.7996  C   C
ARS-BFGL-BAC-10345  834269752   0.9604  A   C
ARS-BFGL-BAC-10365  834269752   0.5296  G   G
ARS-BFGL-BAC-10591  834269752   0.4384  A   A
ARS-BFGL-BAC-10793  834269752   0.9549  C   C
ARS-BFGL-BAC-10867  834269752   0.9400  G   G
ARS-BFGL-BAC-10951  834269752   0.5453  T   T


enter code here

Following is the representation of file2.txt

    ARS-BFGL-BAC-10172
    ARS-BFGL-BAC-1020
    ARS-BFGL-BAC-10245
    ARS-BFGL-BAC-10345
    ARS-BFGL-BAC-10365
    ARS-BFGL-BAC-10591
    ARS-BFGL-BAC-10793
    ARS-BFGL-BAC-10867
    ARS-BFGL-BAC-10951

Output should be:

834269752   A   A
834269752   C   C
834269752   G   G
834269752   A   A
834269752   C   C
834269752   G   G
834269752   T   T
1

There are 1 answers

8
chthonicdaemon On

You can simply use boolean logic, and from your input file it seems you can get away with "normal" input field splitting, which will allow you to get rid of that space in the comparison:

awk 'BEGIN{OFS="\t"}
     NR==FNR{a[$0]++;next}
     ($1 in a) && ($4==$5) {print $2,$4,$5}' file2.txt file1.txt > output.txt

As an example, here is my test file2.txt:

ARS-BFGL-BAC-1020
ARS-BFGL-BAC-10172

And here is the result of the command above:

834269752   A   A