Get effect of regex /a prior to Perl 5.14

174 views Asked by At

I'm working on some code with unusually high backward compatibility requirements: it must work correctly by Perl >= 5.6.0 (yes, you read that right, 5.6.0) and cannot safely assume the existence of most of the core modules.

In one place in this code I need to convert unprintable characters to \xNN escapes for display in an error message. With the features of more modern Perl,

    $text =~ s/[^\x20-\x7e]/"\\x".sprintf("%02x", ord($&))/aeg;

does the job. However, prior to Perl 5.14 the /a modifier causes a compile-time error.

How do I get the effect of /a in Perl 5.6 through 5.12? A single construct that works over the entire range of versions from 5.6 through present would be preferable, if possible; failing that, something that works in all cases where /a is not available would be fine. Solutions that do not use modules are strongly preferred. EBCDIC support is not necessary.

1

There are 1 answers

2
ikegami On

/a affects what \d, \s and \w match. It similarly affects POSIX character classes ([[:name:]]).

To safely remove the /a, make the following substitutions:

Affected Replacement Replacement (5.10+)
\d and [\d] [0-9] \p{PosixDigit}
\s and [\s] [ \f\n\r\t\cK] or [\x09-\x0D\x20] \p{PosixSpace}
\w and [\w] [A-Za-z0-9_] \p{PosixWord}
[[:alpha:]] [A-Za-z] \p{PosixAlpha}
[[:alnum:]] [A-Za-z0-9_] \p{PosixAlnum}
[[:blank:]] [ \t] or [\x09\x20] \p{PosixBlank}
[[:cntrl:]] [\x00-\x1F\x7F] \p{PosixCntrl}
[[:digit:]] [0-9] \p{PosixDigit}
[[:graph:]] [\x21-\x7E] \p{PosixGraph}
[[:lower:]] [a-z] \p{PosixLower}
[[:print:]] [\x20-\x7E] \p{PosixPrint}
[[:punct:]] [\x21-\x2F\x3A-\x40\x5B-\x60\x7B-\x7E] \p{PosixPunct}
[[:space:]] [ \f\n\r\t] or [\x09-\x0D\x20] \p{PosixSpace}
[[:upper:]] [A-Z] \p{PosixUpper}
[[:word:]] [A-Za-z0-9_] \p{PosixWord}
[[:xdigit:]] [0-9a-fA-F] \p{PosixXDigit}

(There's one other POSIX character class, [[:ascii:]]. However, it's not affected by /a.)


Since the pattern in the question uses none of those classes, /a has no effect on it, and the /a can simply be removed.