Goal: create regex of ICD-10 codes.
Format
- Compulsory start:
Letter
,Digit
, (eitherLetter
orDigit
), - Optional end: has a
.
then up to 4 Letters or Digits
I've most of the 1st half:
r'[A-Z][0-9][0-9]'
The second half I'm stuck on:
([a-z]|[0-9]){1,4}$
If there is something generated, it must have a dot .
Examples: .0
or .A9
or .A9A9
or .ZZZZ
or .9999
etc.
Note: I know some ICD-10 codes don't surpass a certain number/ letter; but I am fine with this.
You can use
See the regex demo. Details:
^
- start of string anchor[A-Z]
- an uppercase ASCII letter[0-9]
- an ASCII only digit[A-Z0-9]
- an uppercase ASCII letter or an ASCII digit(?:\.[A-Z0-9]{1,4})?
- an optional sequence of\.
- a dot[A-Z0-9]{1,4}
- one to four occurrences of an uppercase ASCII letter or an ASCII digit$
- end of string anchor (or\Z
can be used here, too).In Python code, you can use the following to validate string input:
Note the anchors are left out because
Pattern.fullmatch
(same asre.fullmatch
) requires a full string match.