Pandas: Drop all string components in a mixed typed series with integers and strings

Question

Pandas: Drop all string components in a mixed typed series with integers and strings

810 views Asked by Elaine Yang At 06 July 2021 at 21:22

This drives me nuts. When I searched for tips about dropping elements in a dataframe there was nothing about mixed typed series.

Say here is a dataframe:

import pandas as pd
df = pd.DataFrame(data={'col1': [1,2,3,4,'apple','apple'], 'col2': [3,4,5,6,7,8]})
a = df['col1']

Then 'a' is a mixed typed series with 6 components. How can I remove all 'apple's from a? I need series = 1,2,3,4.

Original Q&A

There are 3 answers

Golden Lion On 06 July 2021 at 21:45

you can drop by label where label contains a list of index values.

df = pd.DataFrame(data={'col1': [1,2,3,4,'apple','apple'], 'col2': [3,4,5,6,7,8]})
df.reset_index(inplace=True)
print(df)

grouped=df.col1.str.isnumeric().eq(0)

labels=set([x for x in (grouped[grouped.values==True].index)])
if len(labels)>0:
    df = df.drop(labels=labels, axis=0)

output:

   index   col1  col2
0      0      1     3
1      1      2     4
2      2      3     5
3      3      4     6
4      4  apple     7
5      5  apple     8

abhishekbasu On 06 July 2021 at 21:47

You could use the apply method and flag the strings using a lambda and replace them with a value like NaN to filter them out.

import numpy as np

a = df['col1'].apply(lambda x: np.nan if isinstance(x, str) else x).dropna()

What this piece of code does is:

It first replaces all instances of strings in the column with NaN
Then drops the NaNs

This also avoids incorrectly coercing a string element that may contain a valid int/float, for example if the column has an element like "12" in it, assuming this is not the behavior you desire.

Further, if you want the final output to be of int type, you could map it like so:

a = df['col1'].apply(lambda x: np.nan if isinstance(x, str) else x).dropna().map(int)

**SeaBean** · Accepted Answer · 2021-07-06T21:34:51+00:00

To retain the integers as integer type without changing them to float:

Approach: filter rows with numeric values to keep (instead of converting non-numeric values to NaN then drop NaN). The difference is that we won't have intermediate result with NaN, which will force the numeric values to change from integer to float.

a = pd.to_numeric(a[a.astype(str).str.isnumeric()])

Result:

The resulting dtype remains as integer type int64

print(a)

0    1
1    2
2    3
3    4
Name: col1, dtype: int64

If you produce intermediate results with `NaN` like below:

a = pd.to_numeric(a, errors='coerce').dropna()

The resulting dtype is forced to change to float type (instead of remaining as integer)

0    1.0
1    2.0
2    3.0
3    4.0
Name: col1, dtype: float64

TechQA.

Pandas: Drop all string components in a mixed typed series with integers and strings

There are 3 answers

To retain the integers as integer type without changing them to float:

If you produce intermediate results with `NaN` like below:

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in SERIES

Related Questions in MIXED-TYPE

Popular Questions

Popular Tags

Trending Questions

Pandas: Drop all string components in a mixed typed series with integers and strings

There are 3 answers

To retain the integers as integer type without changing them to float:

If you produce intermediate results with NaN like below:

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in SERIES

Related Questions in MIXED-TYPE

Popular Questions

Popular Tags

Trending Questions

If you produce intermediate results with `NaN` like below: