Using blank-line delimited records and colon-separated fields in awk

317 views Asked by At

I'd like to be able to work with a file in awk where records are separated by a blank line and each field consists of a name followed by a colon, some optional whitespace to be ignored/discarded, followed by a value. E.g.

Name: Smith, John
Age: 42

Name: Jones, Mary
Age: 38

Name: Mills, Pat
Age: 62

I understand that I can use RS="" to have awk understand the blank-lines as record separators and FS="\n" to split the fields properly. However, I'd like to then create an array of namevalue pairs that I can use for further processing of the form

if a["Age"] > 40 {print a["Name"]}

The order is usually consistent, but since it would be dumped in an associative array, the incoming order shouldn't matter or be assumed consistent.

How can I transform the data into an awk associative array with the least fuss?

1

There are 1 answers

2
John1024 On BEST ANSWER

Method 1

We use split to split each field into two parts: the key and the value. From these, we create associative array a:

$ awk -F'\n' -v RS=  '{for (i=1;i<=NF;i++) {split($i,arr,/: /); a[arr[1]]=arr[2];} if (a["Age"]+0>40) print a["Name"];}' file
Smith, John
Mills, Pat

Method 2

Here, we split fields at either a colon or a newline. Then, we know that the odd numbered fields are keys and the even ones the values:

$ awk -F':|\n' -v RS=  '{for (i=1;i<=NF;i+=2) {a[$i]=$(i+1);} if (a["Age"]+0>40) print a["Name"];}' file
 Smith, John
 Mills, Pat

Improvement

Is there a chance that any record will be missing a value? If so, we should clear the array a between each record. In GNU awk, this is easy. We just add a delete statement:

awk -F':|\n' -v RS=  '{delete a; for (i=1;i<=NF;i+=2) {a[$i]=$(i+1);} if (a["Age"]+0>40) print a["Name"];}' file

For other awks, you may be required to delete the array one element at a time like:

for (k in a) delete a[k];