I have been playing around with regular expressions, but haven't had any luck yet. I need to introduce some address validation. I need to make sure that a user defined address matches this format:
"717 N 2ND ST, MANKATO, MN 56001"
or possibly this one too:
"717 N 2ND ST, MANKATO, MN, 56001"
and to throw everything else out and alert the user that it is the improper format. I have been looking at the documentation and have tried and failed with many regular expression patterns. I have tried this (and many variations) without any luck:
pat = r'\d{1,6}(\w+),\s(w+),\s[A-Za-z]{2}\s{1,6}'
This one works, but it allows too much junk because it is only making sure it starts with a house number and ends with a zip code (I think):
pat = r'\d{1,6}( \w+){1,6}'
The comma placement is crucial as I am splitting the input string by comma so my first item is the address, then city, then the state and zip are split by a space (here I would like to use a second regex in case they have a comma between state and zip).
Essentially I would like to do this:
# check for this format "717 N 2ND ST, MANKATO, MN 56001"
pat_1 = 'regex to match above pattern'
if re.match(pat_1, addr, re.IGNORECASE):
# extract address
# check for this pattern "717 N 2ND ST, MANKATO, MN, 56001"
pat_2 = 'regex to match above format'
if re.match(pat_2, addr, re.IGNORECASE):
# extract address
else:
raise ValueError('"{}" must match this format: "717 N 2ND ST, MANKATO, MN 56001"'.format(addr))
# do stuff with address
If anyone could help me with forming a regex to make sure there is a pattern match, I would greatly appreciate it!
Here's one that might help. Whenever possible, I prefer to use verbose regular expressions with embedded comments, for maintainability.
Also note the use of
(?P<name>pattern)
. This helps to document the intent of the match, and also provides a useful mechanism to extract the data, if your needs go beyond simple regex validation.To make certain fields optional, we replace
+
with*
, since+
means ONE-or-more, and*
means ZERO-or-more. Here is a version that matches the new requirements in the comments:Next, consider the OR operator,
|
, and the non-capturing group operator,(?:pattern)
. Together, they can describe complex alternatives in the input format. This version matches the new requirement that some addresses have the direction before the street name, and some have the direction after the street name, but no address has the direction in both places.