I am trying to parse a sudoers file that contains it's own versions of regular expressions, but cannot get a valid and consistent result. I need to be able to compare a variable to another that contains a regular expression
Source = "padcl"
Regexp = "*dc*"
None of the 3 options tried (=~, grep, awk) will produce the required result for both matching and non-matching.
Script Example
export VAR1="padcl"
export VAR2="*dc*"
if [[ "$VAR1" =~ "$VAR2" ]]
then
echo "$VAR1 matches $VAR2"
else
echo "$VAR1 does not match $VAR2"
fi
export VALCNT=`echo "$VAR1" |egrep -E "${VAR2}" |wc -l`
if (( $VALCNT > 0 ))
then
echo "$VAR1 grep matches $VAR2"
else
echo "$VAR1 does not grep match $VAR2"
fi
export VALCNT2=`echo "$VAR1" |awk -vparm=${VAR2} '$1 ~ parm {print $0}' |wc -l`
if (( $VALCNT2 > 0 ))
then
echo "$VAR1 awk matches $VAR2"
else
echo "$VAR1 does not awk match $VAR2"
fi
Output - Run 1 (should match)
padcl does not match *dc*
padcl grep matches *dc*
padcl does not awk match *dc*
Output - Run 2 (should not match)
padcl does not match *dx*
padcl grep matches *dx*
padcl does not awk match *dx*
Now I've been coding for 20+ years, I know the regexp standard. How do I get the standard to drop down to the worse regexp standard that sudoers supports, without writing my own regular expression parser?
Solution: I found I could pre-parse the "dc" variable with the following command
export PARM=`echo "$VAR2"| sed 's/\*/\.\*/g'`
this forces the regexp to look for 1 or more characters, rather than the 0 or more characters that * specifies.
As @Etan Reisner pointed out in a comment, the thing you're trying to match is a wildcard (or glob) pattern, not a regular expression. There are many differences between the two, including:
*
matches any string; in an RE, it means "0 or more of the preceding". To match any string in an RE, you use.*
(0 or more anythings)..
matches any single character; in a glob pattern, it's just a period.?
; in a regular expression, that means "0 or 1 of the preceding" (i.e. the last thing is optional, so\.jpe?g
would match both ".jpg" and ".jpeg").grep dc
would match any line that contains "dc". To get a glob pattern to match substrings, add*
to the beginning and end of the pattern; to force an RE matcher to look for whole-string matches, add^
to the beginning and$
to the end of the RE....etc. There are lots more differences, as well as many variations on both glob and RE syntax. Translating between them is possible, but not generally worth it. Especially in this case, since bash has a built-in glob matching capability:
Note that you must use bash (not a generic POSIX shell), so start the script with
#!/bin/bash
NOT#!/bin/sh
. That's because this matching capability is only available in[[ ]]
, not in[ ]
, and some other POSIX shells don't support[[ ]]
. Also, it's important that$VAR2
not be double-quoted in this context; if it were double-quoted bash would look for a literal match rather than a pattern match (note that this is part of the reason your shell test didn't behave as expected).Finally, it's likely that bash's glob syntax isn't quite the same as sudo's. If this is intended for a security-critical purpose (which anything relating to /etc/sudoers tends to be), you may have to write your own recognizer in order to properly match sudo.