Proper way to build a compiled alternation list

47 views Asked by At

I have a series of regex patterns that get grouped into categories. I'm trying to statically compile them into alternations, but I don't want any special meanings to get lost.

As an example, I'm identifying Raspberry Pi GPIO pins by name. There are the GPIO pins, 0-27 (coincidentally the same numbers as the BCP nomenclature), the voltage reference pins, and the named-function pins. Depending upon into which category a particular physical pin falls, assumptions can be made; for example, a voltage-reference pin never has a pull-up status nor a GPIO/BCM number.

So:

_cats = {
    'data': (
        r'gpio\.?([0-9]|[1-3][0-9]|40)',
    ),
    'vref': (
        r'v3_3',
        r'v5'
        r'gnd',
    ),
    'named': (
        r'SDA\.?([01])',
        r'CE\.?0'
        r'CE\.?1',
    ),
}

The first thing I want to do is combine all of the patterns into a single compiled alternation so I can check whether an input actually matches any of my keys. For a single string in the first dict, I could simply use:

crx = rx.compile(('\A' + _cats['data'][0] + '\Z'), rx.IGNORECASE)

For all of them, I could do something like:

crx = re.compile('|'.join([('\A' + rx + '\Z') for rx in _cats['vref']]), re.IGNORECASE)

but this is starting to confuse me. Each regex term should be ^$ or \A\Z bounded, but joining them into alternations and then compiling them is giving me issues.

I'm looking for something like Emacs' regexp-opt function.

I've tried variations on the theme described, and getting syntax errors, patterns that don't match anything, and patterns that match too much.

Edit

Thanks for the comments which helped clarify and solve my main question, but I think the second part got lost somewhere. Specifically,

Is a compiled regex itself a regular expression, or is it a sort of opaque end-point? Would this (p-codish) work?

rx_a = re.compile(r'(?:a|1|#)')
rx_b = re.compile(r'(?:[b-z]|[2-9]|@)')
rx_c = re.compile('|'.join([repr(rx_a), repr(rx_b)]))

Or something of the sort?

1

There are 1 answers

0
OysterShucker On

I believe you can achieve all of your requirements with regex named groups.

The syntax is: (?P<NAME>EXPRESSION).

import re

#expressions
DATA  = r'(?P<data>gpio\.?([0-9]|[1-3][0-9]|40))'
VREF  = r'(?P<vref>v3_3|v5|gnd)'
NAMED = r'(?P<named>SDA\.?([01])|CE\.?[01])'

#compiled expression
search = re.compile(fr'{DATA}|{VREF}|{NAMED}').search

#find
if m:=search(YourData):
    print(m.group('data'))
    print(m.group('vref'))
    print(m.group('named'))