How do I filter empty lines (valid rs-number but no statistical data,) out of my GWAS sumstats?

36 views Asked by At

My data is a munged GWAS summary statistic with the first field containing the rs-number and the following fields containing data like the alleles and z-values. I aim to filter out every row that contains a valid rs-Number but no statistical data (i.e. every following column is empty).

I work in the windows console on a remote cluster with a gzip file (.gz). My command is

#filter out empty columns with valid rs-number zcat ${targetDir}/data.sumstats.gz|awk -F '\t' '$1 ~ /^rs[0-9]+$/ && NF > 1' | grep -vE '^\s*\t*$' > ${targetDir}/data.sumstats.filtered.txt

This still returns an output file containing both complete and empty (except for the rs ID) columns though.

Do you have an idea why that could be? Thanks for your help!

0

There are 0 answers