Catch token word in bash 'heredoc'

73 views Asked by At

The beginning of a 'heredoc' string in bash usually looks like

cat <<EOF or cat << EOF

i.e. there may or may not be a space between the two less-than characters and the token word "EOF". I would like to catch the token word, so I try the following

$ pcretest
PCRE version 8.45 2021-06-15

  re> "^\s*cat.*[^<]<{2}[^<](.*)"
data> cat << EOF
 0: cat << EOF
 1: EOF
data> cat <<EOF
 0: cat <<EOF
 1: OF

As you can see in the string where there is no space between the << and EOF, I only catch "OF" and not "EOF". The expression must match exactly two less-than signs and fail if there are three or more. But why does it gobble up the 'E' so that only "OF" is returned?

1

There are 1 answers

4
The fourth bird On BEST ANSWER

In your pattern using are using a negated character class [^<] which matches a single character other than < which in this case is the E char in the string <<EOF

For your examples and using pcre, you could match the leading spaces and then match << without a following <

^\h*cat\h+<<(?!<)(.*)

The pattern matches:

  • ^ Start of string
  • \h* Match optional horizontal whitespace chars
  • cat\h+ Match cat and 1+ horizontal whitespace chars
  • <<(?!<) Match << and assert not < directly to the right
  • (.*) Capture optional chars in group 1

See a regex demo