Regular expressions in grep and sudoers

1.6k views Asked by At

I am trying to parse a sudoers file that contains it's own versions of regular expressions, but cannot get a valid and consistent result. I need to be able to compare a variable to another that contains a regular expression

Source = "padcl"
Regexp = "*dc*"

None of the 3 options tried (=~, grep, awk) will produce the required result for both matching and non-matching.

Script Example

export VAR1="padcl"
export VAR2="*dc*"

if [[ "$VAR1" =~ "$VAR2" ]]
then
        echo "$VAR1 matches $VAR2"
else
        echo "$VAR1 does not match $VAR2"
fi

export VALCNT=`echo "$VAR1" |egrep -E "${VAR2}" |wc -l`
if (( $VALCNT > 0 ))
then
        echo "$VAR1 grep matches $VAR2"
else
        echo "$VAR1 does not grep match $VAR2"
fi

export VALCNT2=`echo "$VAR1" |awk -vparm=${VAR2} '$1 ~ parm {print $0}' |wc -l`
if (( $VALCNT2 > 0 ))
then
        echo "$VAR1 awk matches $VAR2"
else
        echo "$VAR1 does not awk match $VAR2"
fi

Output - Run 1 (should match)

padcl does not match *dc*
padcl grep matches *dc*
padcl does not awk match *dc*

Output - Run 2 (should not match)

padcl does not match *dx*
padcl grep matches *dx*
padcl does not awk match *dx*

Now I've been coding for 20+ years, I know the regexp standard. How do I get the standard to drop down to the worse regexp standard that sudoers supports, without writing my own regular expression parser?

Solution: I found I could pre-parse the "dc" variable with the following command

 export PARM=`echo "$VAR2"| sed 's/\*/\.\*/g'`

this forces the regexp to look for 1 or more characters, rather than the 0 or more characters that * specifies.

2

There are 2 answers

0
Gordon Davisson On

As @Etan Reisner pointed out in a comment, the thing you're trying to match is a wildcard (or glob) pattern, not a regular expression. There are many differences between the two, including:

  • In a glob pattern, * matches any string; in an RE, it means "0 or more of the preceding". To match any string in an RE, you use .* (0 or more anythings).
  • Speaking of which, in an RE . matches any single character; in a glob pattern, it's just a period.
  • To match any single character in a glob pattern, you'd use ?; in a regular expression, that means "0 or 1 of the preceding" (i.e. the last thing is optional, so \.jpe?g would match both ".jpg" and ".jpeg").
  • Many RE recognizers look for substring matches, rather than whole-string matches. Thus, grep dc would match any line that contains "dc". To get a glob pattern to match substrings, add * to the beginning and end of the pattern; to force an RE matcher to look for whole-string matches, add ^ to the beginning and $ to the end of the RE.

...etc. There are lots more differences, as well as many variations on both glob and RE syntax. Translating between them is possible, but not generally worth it. Especially in this case, since bash has a built-in glob matching capability:

VAR1="padcl"
VAR2="*dc*"

if [[ "$VAR1" = $VAR2 ]]; then
    echo "$VAR1 matches $VAR2"
else
    echo "$VAR1 does not match $VAR2"
fi

Note that you must use bash (not a generic POSIX shell), so start the script with #!/bin/bash NOT #!/bin/sh. That's because this matching capability is only available in [[ ]], not in [ ], and some other POSIX shells don't support [[ ]]. Also, it's important that $VAR2 not be double-quoted in this context; if it were double-quoted bash would look for a literal match rather than a pattern match (note that this is part of the reason your shell test didn't behave as expected).

Finally, it's likely that bash's glob syntax isn't quite the same as sudo's. If this is intended for a security-critical purpose (which anything relating to /etc/sudoers tends to be), you may have to write your own recognizer in order to properly match sudo.

0
Robert Osbourne On

I was using the #!/bin/bash shell command, I didn't include it on the posting.

I found I could pre-parse the "dc" variable with the following command

   export PARM=`echo "$VAR2"| sed 's/\*/\.\*/g'`

this forces the regexp to look for 1 or more characters, rather than the 0 or more characters that * specifies. Thanks for the feedback and advice on posting protocol.