I would like to use each line of a file, samples.txt, as a regular expression and print the entire column that matches this from input.txt.
samples.txt
aa
bb
cc
input.txt
s aa v dd jj bb ww cc
1 1 1 1 2 3 3 8
3 5 4 5 2 7 5 8
output.txt
aa bb cc
1 3 8
5 7 8
I can do these operations separately - reading each line in bash then using it as a regular expression, and separately using the regular expression to print the matching column, but I can not put them together. Any suggestions?
To print each matching column I can use:
awk 'NR==1 {for(i=1;i<=NF;i++) if ($i~/$line/) f=i;next} {print $f}' input.txt
And to iterate through the file for each line to use as a regular expression as above:
while read line; do echo $line; done < samples.txt
However I can't put these two together...
while read line; do
awk 'NR==1 {for(i=1;i<=NF;i++) if ($i~/$line/) f=i;next} {print $f}' input.txt >> output.txt; done < samples.txt
In awk
This basically collects the samples in an array, on the first file. Next on the first line of the second, compares each field to the samples and sets them to 1 if it is the same.
Then loops over each line only printing the fields that are set to one in the array.
To remove the trailing tab following (Kent|Fedorqui|Ed Morton)'s advice