bash script to extract data from large log file

1.6k views Asked by At

I am using a FreeBSD (on Citrix NetScaler)… I have the challenge of extracting the Mbps from a log that has literally 100's of thousands of lines.

The log look something like this, where the Mbps number with decimal can range from 0.0 to 9999.99 or more. I.e.

#>alphatext_anylength... (more_alphatext_in brackets)... Mbps (1.0)… alphatext_anylength... (more_alphatext_in brackets)... 
#>alphatext_anylength... (more_alphatext_in brackets)... Mbps (500.15)… alphatext_anylength... (more_alphatext_in brackets)... 
#>alphatext_anylength... (more_alphatext_in brackets)... Mbps (1500.01)… alphatext_anylength... (more_alphatext_in brackets)... 

Now the challenge is I want to filter out all the Mbps's bracketed number with decimals that is A) greater than 500mbps, with B) line numbers. I.e., for the above sample output, I want to see only the following:

#>[line number 20] 500.15
#>[line number 55] 1500.01

I have tried:

cat output.log | sed -n -e 's/^.*Mbps//p' |cut -c 3-10

Which gives me 10 characters after Mbps. But this is not smart enough to show only bracketed decimal number that is greater than 500Mbps.

I appreciate this might be a bit if a challenge... however would be grateful for any bash scripts wizards out there that can create magic!

Thanks in advance!

5

There are 5 answers

9
Freddy On BEST ANSWER

You can use awk to match the lines containing Mbps ( followed by any non-) characters followed by ). Then replace the beginning of the string up to Mbps ( with an empty string and also ) up to the end with an empty string.

If the remaining line converted to a number (+0) is greater than 500, print the line number and the line.

awk '
  /Mbps \([^)]*\)/{ sub(/.*Mbps \(/, ""); sub(/\).*/, "") }
  ($0+0) > 500{ print FNR, $0 }
' file

Edit: To match lines containing an optional space after Mbps with a value > 50, use

awk '
  /Mbps ?\([^)]*\)/{ sub(/.*Mbps ?\(/, ""); sub(/\).*/, "") }
  ($0+0) > 50{ print FNR, $0 }
' file
1
AudioBubble On

I improved the solution of @Freddy a bit

awk '/Mbps.\(.*\)/{sub(/.*Mbps \(/, ""); sub(/\).*/, "")} ($0+0) > 500{print $0}' output.log

please give him the ckeck :))

11
SiegeX On
$ awk '{match($0,/Mbps \(([^)]*)\)/,a);if(a[1] > 500){print NR,a[1]} }' ./infile
2 500.15
3 1500.01
7
agc On

Using three rounds of sed, (tested with GNU sed, not sure if it works on BSD sed), and mainly shows why sed is not the easiest tool for this job:

sed '=;s/.*).*(\([0-9.]*\)).*(.*/ \1/' output.log | 
sed ':a;s/[0-9]*/#>[line number &]/;N;s/\n//g;n;ba' | 
sed -n '/\b\([5-9]\|[0-9]\{2,\}\)[0-9]\{2,\}[^]]/p'

Or on BSD sed, which doesn't understand \n, try (tentative attempt, since I'm not running BSD):

sed '=;s/.*).*(\([0-9.]*\)).*(.*/ \1/' output.log | 
sed ':a;s/[0-9]*/#>[line number &]/;N;s/
//g;n;ba' | 
sed -n '/\b\([5-9]\|[0-9]\{2,\}\)[0-9]\{2,\}[^]]/p'

Output:

#>[line number 2] 500.15
#>[line number 3] 1500.01

Notes: Why three rounds?

  1. The = outputs the current line number, but the output bypasses any of the line buffers, making the line number invisible within a single invocation of sed.

  2. That = also outputs an unwanted \n, and in sed that's inconvenient to get rid of. See How can I replace a newline (\n) using sed? which shows how the code works.

  3. sed only sees strings, it doesn't know about numbers and has no idea how to find number ranges by value. See Using sed to replace a number greater than a specified number at a specified position for how we can fake it.

2
AudioBubble On

With brackets as shown, you could use them as input field separators with awk:

awk -F '[()]' '($4+0) > 500 {print FNR, $4}' file

You may also want to check that $3 ends in Mbps:

awk -F '[()]' '($4+0) > 500 && $3~/Mbps *$/ {print FNR, $4}' file