Extracting ICCID from a string using regex

2.8k views Asked by At

I'm trying to return and print the ICCID of a SIM card in a device; the SIM cards are from various suppliers and therefore of differing lengths (either 19 or 20 digits). As a result, I'm looking for a regular expression that will extract the ICCID (in a way that's agnostic to non-word characters immediately surrounding it).

Given that an ICCID is specified as a 19-20 digit string starting with "89", I've simply gone for:

(89\d{17,18})

This was the most successful pattern that I'd tested (along with some patterns rejected for reasons below).

In the string that I'm extracting it from, the ICCID is immediately followed by a carriage return and then a line feed, but some testing against terminating it with \r, \n, or even \b failed to work (the program that I'm using is an in-house one built on python, so I suspect that's what it's using for regex). Also, simply using (\d{19,20}) ended up extracting the last 19 digits of a 20-digit ICCID (as the third and last valid match). Along the same lines, I ruled out (\d{19,20})? in principle, as I expect that to finish when it finds the first 19 digits.

So my question is: Should I use the pattern I've chosen, or is there a better expression (not using non-word characters to frame the string) that will return the longest substring of a variable-length string of digits?

3

There are 3 answers

0
Wiktor Stribiżew On BEST ANSWER

If the engine behind the scenes is really Python, and there can be any non-digits chars around the value you need to extract, use lookarounds to restrict the context around the values:

(?<!\d)89\d{17,18}(?!\d)
^^^^^^^         ^^^^^^

The (?<!\d) loobehind will require the absense of a digit before the match and (?!\d) negative lookahead will require the absence of a digit after that value.

See this regex demo

2
freefall On

I'd go for

89\d{17,18}[^\d]

This should prefer 18 digits, but 17 would also suffice. After that, no more other numeric characters would be allowed.

Only limitation: there must be at least one more character after the ICCID (which should be okay from what you described).

Be aware that any longer number sequence carrying "89" followed by 17 or 18 numerical characters would also match.

0
A_Elric On
(\d+)\D+ 

seems like it would do the trick readily. (\d+ ) would capture 20 numbers. \D+ would match anything else afterwards.