Bash: Extract multiple entries from each line

196 views Asked by At

I have a log file that looks somewhat like this after grep my_function $LOG_FILE:

[0] my_function took 96.78581194020808 ms
[1] my_function took 82.0779490750283 ms
[2] my_function took 187.79653799720109 ms
[1] my_function took 98.69955899193883 ms
[0] my_function took 10.296131949871778 ms[1] my_function took 2.5152561720460653 ms
[1] my_function took 2.210912061855197 ms
[2] my_function took 3.418975044041872 ms

From this file, I would like to only extract the numbers from each line. Normally, I would use awk '{print $4}' to do this, but this log contains a few lines with two entries. However, here, I sometimes need to select two separate entries from a single line. How can I appropriately select these with bash/GNU tools?

8

There are 8 answers

1
RavinderSingh13 On BEST ANSWER

With your shown samples please try following awk solutions. We need not to use grep to search string first and then print the required value(s), we could do both of these with awk itself.

Using GNU awk here.

awk '
{
  while(match($0,/my_function took (\S+)/,arr)){
     print arr[1]
     $0=substr($0,RSTART+RLENGTH)
  }
}
' Input_file


2nd solution: Setting RS as my_function took (\\S+) in GNU awk and dealing with RT and split functions later on to get required output as per shown samples.

awk -v RS='my_function took (\\S+)' 'RT && split(RT,arr,FS){print arr[3]}' Input_file
1
blhsing On

Since all records in your input apparently end with ' ms', you can make it the record separator instead:

awk -vRS=' ms' '{print $4}'

Demo: https://awk.js.org/?snippet=zqn7kk

0
dawg On

A Ruby:

ruby -lne '$_.scan(/(\d+\.\d+) ms/).each{|m| puts m}' file 

Or Perl:

perl -lnE 'say $1 while (/(\d+\.\d+) ms/g)' file 
0
sudocracy On

You could also pre-process the log file by sending the second record on the same line onto a line of its own and then do awk { print $4 }:

grep my_function $LOG_FILE | perl -pe 's| ms(?=\[\d+\])| ms\n|g' | awk { print $4 }

The perl one-liner replaces all ms followed by [<some-number>] (and only those ms) with an ms and a new line. This will work even when there are more than two rows on the same line.

0
anubhava On

Using gnu grep:

grep -oP '\stook\s+\K\S+' file

96.78581194020808
82.0779490750283
187.79653799720109
98.69955899193883
10.296131949871778
2.5152561720460653
2.210912061855197
3.418975044041872

Where \K resets/discards previously matched text and \S+ is 1+ of non-whitespace characters in next word after matching took.

1
The fourth bird On

You appear to sometimes have a second line after the first line with the same format, where the value of interest is in the 4th column.

If that is always the case, instead of printing the 4th column, you can print every column where column_number % 4 == 0

awk '{ for (i=1; i<=NF; i++) if (i%4 == 0) print $i }' file

Output

96.78581194020808
82.0779490750283
187.79653799720109
98.69955899193883
10.296131949871778
2.5152561720460653
2.210912061855197
3.418975044041872
3
Daweo On

I would exploit built-in GNU AWK variables for this task following way, let file.txt content be

[0] my_function took 96.78581194020808 ms
[1] my_function took 82.0779490750283 ms
[2] my_function took 187.79653799720109 ms
[1] my_function took 98.69955899193883 ms
[0] my_function took 10.296131949871778 ms[1] my_function took 2.5152561720460653 ms
[1] my_function took 2.210912061855197 ms
[2] my_function took 3.418975044041872 ms

then

awk 'BEGIN{FPAT="[0-9]+[.][0-9]+";OFS="\n"}NF{$1=$1;print}' file.txt

gives output

96.78581194020808
82.0779490750283
187.79653799720109
98.69955899193883
10.296131949871778
2.5152561720460653
2.210912061855197
3.418975044041872

Explanation: I instruct GNU AWK that field consist of one-or-more (+) digits followed by literal dot followed by one-or-more digits and that output field separator (OFS) should be newline. Then for line with at least one field found (NF) I do $1=$1 to trigger rebuilt and print that. If you want to know more about FPAT or OFS or NF then read 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR.

Disclaimer: this solution assumes European fractions below 1 (like 0.5) rather than U.S. fractions (like .5).

(tested in GNU Awk 5.1.0)

0
Ed Morton On

Using any awk:

$ awk '{for (i=4; i<NF; i+=4) print $i}' file
96.78581194020808
82.0779490750283
187.79653799720109
98.69955899193883
10.296131949871778
2.5152561720460653
2.210912061855197
3.418975044041872

or get rid of the initial grep you're doing and just run this one command:

$ awk '/my_function /{for (i=4; i<NF; i+=4) print $i}' "$LOG_FILE"

Alternatively, using GNU awk for multi-char RS and RT:

$ awk -v RS='my_function took [0-9.]+' 'RT{$0=RT; print $NF}' "$LOG_FILE"
96.78581194020808
82.0779490750283
187.79653799720109
98.69955899193883
10.296131949871778
2.5152561720460653
2.210912061855197
3.418975044041872