How to make regex code apply only to empty target cells

Question

How to make regex code apply only to empty target cells

48 views Asked by Ooch At 29 March 2024 at 19:27

An example of my data

StreetAddress	City	State	Zip
1 Main St 01123	Winsted	CT
1 Main St	Winsted	CT	01123

I am trying to use regex and pandas to clean a spreadsheet that I have. The problem I am running into is that my regex code is replacing every cell in the entire column even if there is valid data in it.

I tried

df['Zip'] = df['StreetAddress'].str.extract(r'(\d{5})')
df['StreetAddress'] = df['StreetAddress'].str.replace(r'(\d{5})', '', regex=True)

which gives me

StreetAddress	City	State	Zip
1 Main St	Winsted	CT	01123
1 Main St	Winsted	CT

I was hoping for something more like this

StreetAddress	City	State	Zip
1 Main St	Winsted	CT	01123
1 Main St	Winsted	CT	01123

Original Q&A

There are 1 answers

**mozway** · Accepted Answer · 2024-03-29T19:34:13+00:00

I would use a boolean mask, this will avoid overwriting existing data, and also be more efficient since only the relevant rows will be evaluated:

add = df['StreetAddress'].str.extract(r'(\d{5})', expand=False)
m = add.notna()
df.loc[m, 'Zip'] = add[m]
df.loc[m, 'StreetAddress'] = (df.loc[m, 'StreetAddress']
                                .str.replace(r' *\d{5}', '', regex=True)
                             )

Alternatively:

df['Zip'] = df['StreetAddress'].str.extract(r'(\d{5})', expand=False).fillna(df['Zip'])

Or, as suggested by @ouroboros1, to keep the original Zip:

df['Zip'].fillna(df['StreetAddress'].str.extract(r'(\d{5})', expand=False))

Output:

  StreetAddress     City State    Zip
0     1 Main St  Winsted    CT  01123
1     1 Main St  Winsted    CT  01123

TechQA.

How to make regex code apply only to empty target cells

There are 1 answers

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in REGEX

Popular Questions

Trending Questions