Awk print a value in a column that corresponds to key values

46 views Asked by At

I want to add a column to a file depending on a key value. I have infile_chr"N".txt (where N is a number from 1 to 22) and I need a single output file (outfile.txt) in which the first column is N.

Here there is an example of the output file:

1 856108 0.02625
1 870806 0.02625
1 884635 0.02625
...
22 51111340 0.02625
22 51135384 0.02625

But in the input files there is no column with the number N. Here the first two lines of the input file "infile_chr1.txt", where marked with ** you can find the columns I want to print:

**856108**  14774   908823  40  1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,    ()  0.025   0.024375    **0.02625** 0.975   0.02875
**870806**  55545   921716  40  1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,    ()  0.025   0.024375    **0.02625** 0.975   0.02875

I tried with this code: for K in {1..22}; do awk '{$2="$K"; print $2,$1,$9}' infile_chr"$K".txt >> outfile.txt; done

but the output is wrong:

$K 856108 0.02625
$K 870806 0.02625
$K 884635 0.02625
$K 899937 0.02625
$K 908823 0.02625

Can anyone help me? Many thanks.

2

There are 2 answers

0
anubhava On

You don't need to do bash looping. awk can do this in a single command like this:

awk 'FNR == 1 {f = FILENAME; gsub(/^infile_chr|\.txt$/, "", f)}
     {print f, $1, $9}' infile_chr* > output

cat output 

1 856108 0.02625
1 870806 0.02625
...
...
22 356102 0.08719
22 670808 0.05442
0
Daweo On
awk '{$2="$K"; print $2,$1,$9}'

This is not correct way of using shell variable inside GNU AWK, you should use --assign var=val or -v var=val if you want to access such variable. In this particular case, fixing code would result in

for K in {1..22}; do awk --assign K="$K" '{$2=K; print $2,$1,$9}' infile_chr"$K".txt >> outfile.txt; done

but you do not need shell for contraption, as you might feed GNU AWK's ARGV with filenames to consume. For example if I would need to output 1st column from TAB-sheared file1.tsv to file10.tsv I could do that by

awk 'BEGIN{for(i=1;i<=10;i+=1){ARGV[++ARGC]="file" i ".tsv"}}{print $1}'

Explanation: I use for loop inside AWK, in each turn I increase number of arguments (ARGC) and put name created from file number .tsv in desired place of ARGV array.

(tested in GNU Awk 5.1.0)