I am trying to avoid extracting the IBAN number from my string.
Example:
def get_umsatzsteuer_identifikationsnummer(string):
# Demo --> https://regex101.com/r/VHaS7Y/1
reg = r'DE[0-9 ]{12}|DE[0-9]{9}|DE [0-9]{9}'
match = re.compile(reg)
matched_words = match.findall(string)
return matched_words
string = "I want to get this DE813992525 and this DE813992526 number and this
number DE 813 992 526 and this number DE 813992526. I do not want the bank
account number: IBAN DE06300501100011054517."
get_umsatzsteuer_identifikationsnummer(string)
>>>>> ['DE813992525',
'DE813992526',
'DE 813 992 526',
'DE 813992526',
'DE063005011000']
The last number in the results, is (the first part) of the German IBAN number, which I don't want to extract. How can I avoid it?
You can shorten the alternation by making the space optional. If you don't want the last number, but you do want the number that ends with a dot, you can assert that the pattern is not followed by a digit.
Regex demo
You might also make it a bit more precise matching 3 times 3 digits preceded by a space for the third example, as
[0-9 ]{12}
could also possibly match 12 spaces.Regex demo