I have a series of regex patterns that get grouped into categories. I'm trying to statically compile them into alternations, but I don't want any special meanings to get lost.
As an example, I'm identifying Raspberry Pi GPIO pins by name. There are the GPIO pins, 0-27 (coincidentally the same numbers as the BCP nomenclature), the voltage reference pins, and the named-function pins. Depending upon into which category a particular physical pin falls, assumptions can be made; for example, a voltage-reference pin never has a pull-up status nor a GPIO/BCM number.
So:
_cats = {
'data': (
r'gpio\.?([0-9]|[1-3][0-9]|40)',
),
'vref': (
r'v3_3',
r'v5'
r'gnd',
),
'named': (
r'SDA\.?([01])',
r'CE\.?0'
r'CE\.?1',
),
}
The first thing I want to do is combine all of the patterns into a single compiled alternation so I can check whether an input actually matches any of my keys. For a single string in the first dict, I could simply use:
crx = rx.compile(('\A' + _cats['data'][0] + '\Z'), rx.IGNORECASE)
For all of them, I could do something like:
crx = re.compile('|'.join([('\A' + rx + '\Z') for rx in _cats['vref']]), re.IGNORECASE)
but this is starting to confuse me. Each regex term should be ^$
or \A\Z
bounded, but joining them into alternations and then compiling them is giving me issues.
I'm looking for something like Emacs' regexp-opt
function.
I've tried variations on the theme described, and getting syntax errors, patterns that don't match anything, and patterns that match too much.
Edit
Thanks for the comments which helped clarify and solve my main question, but I think the second part got lost somewhere. Specifically,
Is a compiled regex itself a regular expression, or is it a sort of opaque end-point? Would this (p-codish) work?
rx_a = re.compile(r'(?:a|1|#)')
rx_b = re.compile(r'(?:[b-z]|[2-9]|@)')
rx_c = re.compile('|'.join([repr(rx_a), repr(rx_b)]))
Or something of the sort?
I believe you can achieve all of your requirements with regex
named groups
.The syntax is:
(?P<NAME>EXPRESSION)
.