Removing values of a certain object type from a dataframe column in Pandas

1.4k views Asked by At

I have a pandas dataframe where some values are integers and other values are an array. I simply want to drop all of the rows that contain the array (object datatype I believe) in my "ORIGIN_AIRPORT_ID" column, but I have not been able to figure out how to do so after trying many methods.

Here is what the first 20 rows of my dataframe looks like. The values that show up like a list are the ones I want to remove. The dataset is a couple million rows, so I just need to write code that removes all of the array-like values in that specific dataframe column if that makes sense.

dataframe

2

There are 2 answers

0
itprorh66 On
df = df[df.origin_airport_ID.str.contains(',') == False]
1
Ralubrusto On

You should consider next time giving us a data sample in text, instead of a figure. It's easier for us to test your example.

Original data:

    ITIN_ID             ORIGIN_AIRPORT_ID
0  20194146                         10397
1  20194147                         10397
2  20194148                         10397
3  20194149  [10397, 10398, 10399, 10400]
4  20194150                         10397

In your case, you can use the .to_numeric pandas function:

df['ORIGIN_AIRPORT_ID'] = pd.to_numeric(df['ORIGIN_AIRPORT_ID'], errors='coerce')

It replaces every cell that cannot be converted into a number to a NaN ( Not a Number ), so we get:

    ITIN_ID  ORIGIN_AIRPORT_ID
0  20194146            10397.0
1  20194147            10397.0
2  20194148            10397.0
3  20194149                NaN
4  20194150            10397.0

To remove these rows now just use .dropna

df = df.dropna().astype('int')

Which results in your desired DataFrame

    ITIN_ID  ORIGIN_AIRPORT_ID
0  20194146              10397
1  20194147              10397
2  20194148              10397
4  20194150              10397