Bash: Extract multiple entries from each line

Question

Bash: Extract multiple entries from each line

196 views Asked by Green绿色 At 27 October 2023 at 01:36

I have a log file that looks somewhat like this after grep my_function $LOG_FILE:

[0] my_function took 96.78581194020808 ms
[1] my_function took 82.0779490750283 ms
[2] my_function took 187.79653799720109 ms
[1] my_function took 98.69955899193883 ms
[0] my_function took 10.296131949871778 ms[1] my_function took 2.5152561720460653 ms
[1] my_function took 2.210912061855197 ms
[2] my_function took 3.418975044041872 ms

From this file, I would like to only extract the numbers from each line. Normally, I would use awk '{print $4}' to do this, but this log contains a few lines with two entries. However, here, I sometimes need to select two separate entries from a single line. How can I appropriately select these with bash/GNU tools?

Original Q&A

There are 8 answers

blhsing On 27 October 2023 at 02:03

Since all records in your input apparently end with ' ms', you can make it the record separator instead:

awk -vRS=' ms' '{print $4}'

Demo: https://awk.js.org/?snippet=zqn7kk

dawg On 27 October 2023 at 02:16

A Ruby:

ruby -lne '$_.scan(/(\d+\.\d+) ms/).each{|m| puts m}' file

Or Perl:

perl -lnE 'say $1 while (/(\d+\.\d+) ms/g)' file

sudocracy On 27 October 2023 at 06:58

You could also pre-process the log file by sending the second record on the same line onto a line of its own and then do awk { print $4 }:

grep my_function $LOG_FILE | perl -pe 's| ms(?=\[\d+\])| ms\n|g' | awk { print $4 }

The perl one-liner replaces all ms followed by [<some-number>] (and only those ms) with an ms and a new line. This will work even when there are more than two rows on the same line.

anubhava On 27 October 2023 at 08:39

Using gnu grep:

grep -oP '\stook\s+\K\S+' file

96.78581194020808
82.0779490750283
187.79653799720109
98.69955899193883
10.296131949871778
2.5152561720460653
2.210912061855197
3.418975044041872

Where \K resets/discards previously matched text and \S+ is 1+ of non-whitespace characters in next word after matching took.

The fourth bird On 27 October 2023 at 10:15

You appear to sometimes have a second line after the first line with the same format, where the value of interest is in the 4th column.

If that is always the case, instead of printing the 4th column, you can print every column where column_number % 4 == 0

awk '{ for (i=1; i<=NF; i++) if (i%4 == 0) print $i }' file

Output

96.78581194020808
82.0779490750283
187.79653799720109
98.69955899193883
10.296131949871778
2.5152561720460653
2.210912061855197
3.418975044041872

Daweo On 27 October 2023 at 18:33

I would exploit built-in GNU AWK variables for this task following way, let file.txt content be

[0] my_function took 96.78581194020808 ms
[1] my_function took 82.0779490750283 ms
[2] my_function took 187.79653799720109 ms
[1] my_function took 98.69955899193883 ms
[0] my_function took 10.296131949871778 ms[1] my_function took 2.5152561720460653 ms
[1] my_function took 2.210912061855197 ms
[2] my_function took 3.418975044041872 ms

then

awk 'BEGIN{FPAT="[0-9]+[.][0-9]+";OFS="\n"}NF{$1=$1;print}' file.txt

gives output

96.78581194020808
82.0779490750283
187.79653799720109
98.69955899193883
10.296131949871778
2.5152561720460653
2.210912061855197
3.418975044041872

Explanation: I instruct GNU AWK that field consist of one-or-more (+) digits followed by literal dot followed by one-or-more digits and that output field separator (OFS) should be newline. Then for line with at least one field found (NF) I do $1=$1 to trigger rebuilt and print that. If you want to know more about FPAT or OFS or NF then read 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR.

Disclaimer: this solution assumes European fractions below 1 (like 0.5) rather than U.S. fractions (like .5).

(tested in GNU Awk 5.1.0)

Ed Morton On 28 October 2023 at 00:45

Using any awk:

$ awk '{for (i=4; i<NF; i+=4) print $i}' file
96.78581194020808
82.0779490750283
187.79653799720109
98.69955899193883
10.296131949871778
2.5152561720460653
2.210912061855197
3.418975044041872

or get rid of the initial grep you're doing and just run this one command:

$ awk '/my_function /{for (i=4; i<NF; i+=4) print $i}' "$LOG_FILE"

Alternatively, using GNU awk for multi-char RS and RT:

$ awk -v RS='my_function took [0-9.]+' 'RT{$0=RT; print $NF}' "$LOG_FILE"
96.78581194020808
82.0779490750283
187.79653799720109
98.69955899193883
10.296131949871778
2.5152561720460653
2.210912061855197
3.418975044041872

**RavinderSingh13** · Accepted Answer · 2023-10-27T01:51:17+00:00

With your shown samples please try following awk solutions. We need not to use grep to search string first and then print the required value(s), we could do both of these with awk itself.

Using GNU awk here.

awk '
{
  while(match($0,/my_function took (\S+)/,arr)){
     print arr[1]
     $0=substr($0,RSTART+RLENGTH)
  }
}
' Input_file

2nd solution: Setting RS as my_function took (\\S+) in GNU awk and dealing with RT and split functions later on to get required output as per shown samples.

awk -v RS='my_function took (\\S+)' 'RT && split(RT,arr,FS){print arr[3]}' Input_file

TechQA.

Bash: Extract multiple entries from each line

There are 8 answers

Related Questions in AWK

Related Questions in GNU

Related Questions in UNIX-TEXT-PROCESSING

Popular Questions

Trending Questions