The following will output "b1 as it recognizes the quoted space as a field delimiter. How do I tell awk to ignore quoted delimiters so that this would output b1 b2 or "b1 b2"

echo 'a "b1 b2" c'| awk '{print $2}'

I see the following two related posts, but I'm having trouble getting the solutions to work. I was hoping to find a simple solution. Field parsing is awk's specialty, right?

awk ignore delimiter inside single quote within a parenthesis What's the most robust way to efficiently parse CSV using awk?

4 Answers

hek2mgl On

With gawk (GNU awk) you can use the FPAT special variable to define how a field looks like instead of being limited to specify a delimiter:

echo 'a "b1 b2" c'| gawk '{print $2}' FPAT='("[^"]+")|[^[:blank:]]+'

Here we say: A field is either a " followed by non " chars and a closing " -> ("[^"]+") ... or | a sequence of non-blank chars -> [^[:blank:]]+

These regexes will be evaluated in order, therefore a field enclosed in "" has precedence over the second pattern, the sequence of non blank chars (awk's default).

See GNU awk manual: Defining fields by content

Community On

Shortest answer:

echo 'a "b1 b2" c'| awk -F\" '{print $2}'

will output: b1 b2

ctac_ On

You can get what you look for this way:

awk '{split($0,a,/^"|" "| "|" |"$/);j=a[1]!=""?0:1;print a[2+j]}'

I think you can get a way where it fail ...

clay On

awk doesn't have the simple, convenient support for quoted fields that I wanted. I also looked at cut and that didn't either.

Another widely available bash shell tool called csvcut included as part of a bundle of tools called csvkit does provide easy support for quoted fields. My data is space delimited, not comma delimited, but I can easily specify a space delimiter to the csvcut tool.

This is what I wanted:

# Gives a
echo 'a "b1 b2" c d e' | csvcut -d ' ' -c 1
# Gives b1 b2
echo 'a "b1 b2" c d e' | csvcut -d ' ' -c 2
# Gives c
echo 'a "b1 b2" c d e' | csvcut -d ' ' -c 3