I'm working with GWAS data.
Using p-link
command I was able to get SNPslist, SNPs.map
, SNPs.ped
.
Here are the data files and commands I have for 2 SNPs (rs6923761
, rs7903146
):
$ cat SNPs.map
0 rs6923761 0 0
0 rs7903146 0 0
$ cat SNPs.ped
6 6 0 0 2 2 G G C C
74 74 0 0 2 2 A G T C
421 421 0 0 2 2 A G T C
350 350 0 0 2 2 G G T T
302 302 0 0 2 2 G G C C
bash
commands I used:
echo -n IID > SNPs.csv
cat SNPs.map | awk '{printf ",%s", $2}' >> SNPs.csv
echo >> SNPs.csv
cat SNPs.ped | awk '{printf "%s,%s%s,%s%s\n", $1, $7, $8, $9, $10}' >> SNPs.csv
cat SNPs.csv
Output:
IID,rs6923761,rs7903146
6,GG,CC
74,AG,TC
421,AG,TC
350,GG,TT
302,GG,CC
This is about 2 SNPs, so I can see manually their position so I added and called using the above command. But now I have 2000 SNPs IDs and their values. Need help with bash
command which can parse over 2000 SNPs in the same way.
One
awk
idea that replaces all of the current code:NOTE: remove comments to declutter code
This generates: