How to identify records in a DataFrame (Python/Pandas) that contains leading or trailing spaces

Question

How to identify records in a DataFrame (Python/Pandas) that contains leading or trailing spaces

497 views Asked by mmunozjr At 03 September 2022 at 16:46

I would like to know how to write a formula that would identify/display records of string/object data type on a Pandas DataFrame that contains leading or trailing spaces.

The purpose for this is to get an audit on a Jupyter notebook of such records before applying any strip functions.

The goal is for the script to identify these records automatically without having to type the name of the columns manually. The scope should be any column of str/object data type that contains a value that includes either a leading or trailing spaces or both.

Please notice. I would like to see the resulting output in a dataframe format.

Thank you!

Link to sample dataframe data

Original Q&A

There are 1 answers

**mozway** · Answer 1 · 2022-09-03T16:50:05+00:00

You can use:

df['col'].str.startswith(' ')
df['col'].str.endswith(' ')

or with a regex:

df['col'].str.match(r'\s+')
df['col'].str.contains(r'\s+$')

Example:

df = pd.DataFrame({'col': [' abc', 'def', 'ghi ', ' jkl ']})

df['start'] = df['col'].str.startswith(' ')
df['end'] = df['col'].str.endswith(' ')
df['either'] = df['start'] | df['stop']

     col  start    end  either
0    abc   True  False    True
1    def  False  False   False
2   ghi   False   True    True
3   jkl    True   True    True

However, this is likely not faster than directly stripping the spaces:

df['col'] = df['col'].str.strip()

   col
0  abc
1  def
2  ghi
3  jkl

updated answer

To detect the columns with leading/traiing spaces, you can use:

cols = df.astype(str).apply(lambda c: c.str.contains(r'^\s+|\s+$')).any()
cols[cols].index

example on the provided link:

Index(['First Name', 'Team'], dtype='object')

TechQA.

How to identify records in a DataFrame (Python/Pandas) that contains leading or trailing spaces

There are 1 answers

updated answer

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in DATAFRAME

Related Questions in SPACES

Popular Questions

Trending Questions