There is a .bed file. It has 4 columns. First contains the number of the chromosome. I need to write a bash script, to get every row which belongs to a specific chromosome, then in those cases subtract the second column from the third column (this gives the length of the gene), then I need to calculate the average length of those genes (which is on the same chromosome). And i have to do this on every chromosomes.
This code calculates the average length of the whole table, but i need to do this separately on every chromosome.
`#!/bin/bash
input_bed=${1}
awk 'BEGIN {
FS="\t"
sum=0
}
{
sum+=$3-$2
} END {
print sum / NR;
}' ${input_bed}
#Exiting
exit`
You can put a predicate before the line processing block, it will then only run on input lines that satisfy the condition. Swap "1" for whatever chromosome you are investigating.
Alternatively, you can do it all in one run by saving the results to an associative array.