drop duplicates in list within data frames python

Question

drop duplicates in list within data frames python

38 views Asked by Kellie At 05 November 2023 at 02:24

I have a dataframe that I have grouped with textbook ISBN and I the schools, state and grades that those books are used in. I want to remove the duplicates within the lists of the dataframe. I have tried the following steps within the screenshots for the state column as a test but Im not sure if its a list or a dataframe or a series as I tried number of code to see if any will work. Was wondering if someone can explain the structure of these "list" within a dataframe and any code to drop the duplicates.step1 step2 step3 step4

Original Q&A

There are 1 answers

**Tanishq Chaudhary** · Answer 1 · 2023-11-05T02:51:20+00:00

The df['State'] is a <class 'pandas.core.series.Series'> data type. But, each element of this series is a list, as you converted it during aggregation. Therefore, when you .apply() the lambda on the df['State'], it sees each x as a list.

You can .apply() the lambda x: list(set(x)))) instead of lambda x: x.drop_duplicates(). It will do the same job - removing duplicates.

Sample example:

import pandas as pd

df = pd.DataFrame(
    {
        "val": [1, 1, 2, 3, 4, 3, 2],
        "data": ["X", "Y", "X", "X", "X", "X", "X"],
    }
)

df = df.groupby(["val"]).agg(lambda x: x.tolist())
print(type(df["data"]))
print((df["data"].apply(lambda x: list(set(x)))))

Output:

<class 'pandas.core.series.Series'>
val
1    [Y, X]
2       [X]
3       [X]
4       [X]
Name: data, dtype: object

TechQA.

drop duplicates in list within data frames python

There are 1 answers

Related Questions in PYTHON

Related Questions in DATAFRAME

Related Questions in LIST

Related Questions in SERIES

Related Questions in DROP-DUPLICATES

Popular Questions

Popular Tags

Trending Questions