Avoid extracting IBAN number from string

152 views Asked by At

I am trying to avoid extracting the IBAN number from my string.

Example:

def get_umsatzsteuer_identifikationsnummer(string):
  # Demo --> https://regex101.com/r/VHaS7Y/1
  
  reg = r'DE[0-9 ]{12}|DE[0-9]{9}|DE [0-9]{9}'
  match = re.compile(reg)
  matched_words = match.findall(string)

  return matched_words


string = "I want to get this DE813992525 and this DE813992526 number and this
 number DE 813 992 526 and this number  DE 813992526. I do not want the bank
 account number: IBAN DE06300501100011054517."

get_umsatzsteuer_identifikationsnummer(string)


>>>>> ['DE813992525',
 'DE813992526',
 'DE 813 992 526',
 'DE 813992526',
 'DE063005011000']

The last number in the results, is (the first part) of the German IBAN number, which I don't want to extract. How can I avoid it?

1

There are 1 answers

1
The fourth bird On BEST ANSWER

You can shorten the alternation by making the space optional. If you don't want the last number, but you do want the number that ends with a dot, you can assert that the pattern is not followed by a digit.

\b(?:DE[0-9 ]{12}|DE ?[0-9]{9})(?!\d)

Regex demo

You might also make it a bit more precise matching 3 times 3 digits preceded by a space for the third example, as [0-9 ]{12} could also possibly match 12 spaces.

\b(?:DE(?: \d{3}){3}|DE ?[0-9]{9})(?!\d)

Regex demo