Finding the most frequent contestant on survivor without duplicates

52 views Asked by At

Trying to find the most frequent contestant on the game show survivor but need to drop the duplicates of contestants that might have been in the same season (if they entered back in the same season that doesn't count)

The castaways dataframe is sorted by

  • season name,
  • season,
  • castaway id,
  • full name,
  • season,
  • etc.

The castaway_details dataframe has the information

  • castaway id,
  • full name,
  • gender,
  • etc.

I use the castaway dataframe to count the number of times a contestant was on a season of survivor, however, i want to remove if that contestant was on the same season multiple times.

castaway_details[castaway_details['castaway id'] == castaways['castaway id'].value_counts().idxmax()]

This gives me the wrong answer, because the person that comes out shows in the dataframe 6 times however, 3 of those times were on the same season and I don't want that

1

There are 1 answers

0
DouxDoux On
# import pandas library 
import pandas as pd

# load data 
data = [['jane', 3], ['jane', 3], ['karen', 10]]

# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['name', 'season'])

# drop rows which have same name and same season 
newdf = df.drop_duplicates(
    subset=['name', 'season'],
    keep='last').reset_index(drop=True)

print(df)
print(newdf)