Python: ICD-10 RegEx

766 views Asked by At

Goal: create regex of ICD-10 codes.

Format

  • Compulsory start: Letter, Digit, (either Letter or Digit),
  • Optional end: has a . then up to 4 Letters or Digits

I've most of the 1st half:

r'[A-Z][0-9][0-9]'

The second half I'm stuck on:

([a-z]|[0-9]){1,4}$

If there is something generated, it must have a dot .

Examples: .0 or .A9 or .A9A9 or .ZZZZ or .9999 etc.


Test Python RegEx

Note: I know some ICD-10 codes don't surpass a certain number/ letter; but I am fine with this.

1

There are 1 answers

0
Wiktor Stribiżew On BEST ANSWER

You can use

^[A-Z][0-9][A-Z0-9](?:\.[A-Z0-9]{1,4})?$

See the regex demo. Details:

  • ^ - start of string anchor
  • [A-Z] - an uppercase ASCII letter
  • [0-9] - an ASCII only digit
  • [A-Z0-9] - an uppercase ASCII letter or an ASCII digit
  • (?:\.[A-Z0-9]{1,4})? - an optional sequence of
    • \. - a dot
    • [A-Z0-9]{1,4} - one to four occurrences of an uppercase ASCII letter or an ASCII digit
  • $ - end of string anchor (or \Z can be used here, too).

In Python code, you can use the following to validate string input:

icd10_rx = re.compile(r'[A-Z][0-9][A-Z0-9](?:\.[A-Z0-9]{1,4})?')
if icd10_rx.fullmatch(text):
    print(f'{text} is valid!')

Note the anchors are left out because Pattern.fullmatch (same as re.fullmatch) requires a full string match.