I am looking to convert zip+4 codes into zip codes in a pandas dataframe. I want it to identify that a zip 4 code exists and keep just the first 5 digits. I effectively want to do the below code (although this doesn't work in this format):
df.replace('^(\d{5}-?\d{4})', group(1), regex=True)
The following code does the same procedure for a list, I'm looking to do the same thing in the dataframe.
my_input = ['01234-5678', '012345678', '01234', 'A1A 1A1', 'A1A1A1']
expression = re.compile(r'^(\d{5})-?(\d{4})?$')
my_output = []
for string in my_input:
if m := re.match(expression, string):
my_output.append(re.match(expression, string).group(1))
else:
my_output.append(string)
You can use
See the regex demo.
Details:
^- start of string(\d{5})- Group 1 (\1): five digits-?- an optional-\d{4}- any four digits$- end of string.