I am trying to do a (bedtools) intersect. Anyone see what is wrong with my bash code here?
#!/bin/bash
dir=$(pwd)
query=$dir/query_beds
pgs=$dir/pgs_beds
for file in $query/*; do
bedtools intersect -wa -wb -a $file -b $pgs/* -c -C -sorted -filenames
done > ${file%.*}-results.txt
"query" directory contains many files which need to each be iteratively queried against many files in "pgs" directory using the package bedtools and command intersect, which allows an asterisk for file "-b" to cycle multiple files.
Per usage:
One or more BAM/BED/GFF/VCF file(s) “B”. Use “stdin” if passing B with a UNIX pipe.
NEW!!!: -b may be followed with multiple databases and/or wildcard (*) character(s).
I believe the problem is with $pgs/*, looping one file against multiple files in another directory and I may need to restructure my loop to change directory or something.
UPDATE:
I have updated the script to work and was able to get everything working with:
#!/bin/bash -x
dir=$(pwd)
query=$dir/query_beds
pgs=$dir/pgs_beds
for f in $(ls $query/*.bed); do
for g in $(ls $pgs/*); do
bedtools intersect -wa -wb -a $f -b $g* -C -filenames
done > $(basename $f .ext).txt
done
In your code the output of the whole loop is redirected to one file. Maybe you want something like below (redirect the output of each bedtool to a separate file).
Also note that if you use
${file%.*}
the outputs will be created in the$query
directory so executing the script twice might result in unexpected results. I would recommend to use$(basename $file .ext).txt
(where .ext is the extension of the files in$query
. The basename command keeps only the name of the file (so without the directory) and also removes the .ext if present.