I'm actually clearing my email contact database through the means of python scripting. However, I face some issue while doing so when I remove the duplicate the shape of the data frame still retain even though I check for that there is over 600 duplicates shown. You may refer to the attached codes.
I made use of .drop_duplicates function to remove the duplicate and .shape to show the size again.
import pandas as pd import numpy as np from pandas import DataFrame data = pd.read_csv('ToBeSort.csv') data.shape data['Last Name'].duplicated() dupes = data.drop_duplicates(subset=["Last Name"], keep=False) print(dupes.shape) dupes.to_csv('New.csv')
The duplicates still surface after export to new csv. The expected output for the new csv should not have any duplicates email in it.