I have a large CSV file, which is a log of caller data.
A short snippet of my file:
CompanyName High Priority QualityIssue
Customer1 Yes User
Customer1 Yes User
Customer2 No User
Customer3 No Equipment
Customer1 No Neither
Customer3 No User
Customer3 Yes User
Customer3 Yes Equipment
Customer4 No User
I want to sort the entire list by the frequency of occurrence of customers so it will be like:
CompanyName High Priority QualityIssue
Customer3 No Equipment
Customer3 No User
Customer3 Yes User
Customer3 Yes Equipment
Customer1 Yes User
Customer1 Yes User
Customer1 No Neither
Customer2 No User
Customer4 No User
I've tried groupby
, but that only prints out the Company Name and the frequency but not the other columns, I also tried
df['Totals']= [sum(df['CompanyName'] == df['CompanyName'][i]) for i in xrange(len(df))]
and
df = [sum(df['CompanyName'] == df['CompanyName'][i]) for i in xrange(len(df))]
But these give me errors:
ValueError: The wrong number of items passed 1, indices imply 24
I've looked at something like this:
for key, value in sorted(mydict.iteritems(), key=lambda (k,v): (v,k)):
print "%s: %s" % (key, value)
but this only prints out two columns, and I want to sort my entire CSV. My output should be my entire CSV sorted by the first column.
Thanks for the help in advance!
This seems to do what you want, basically add a count column by performing a
groupby
andtransform
withvalue_counts
and then you can sort on that column:Output:
You can drop the extraneous column using
df.drop
:Output: