How to count a number of occurrences in Data Frame?

611 views Asked by At

Need help. I have Pandas DataFrame like:

Shown ID                                       Bought ID
59,60,61,62,60,63,64,65,66,61,67,68,67         67,60,63
63,64,63,64,63,65,66                           0
87,63,84,63,86                                 86

I need to find the number of occurrences of each number of each "Show ID" row in whole "Show ID" column.

So the expected result for "Shown ID" column is:

    [[('59', 1), ('60', 2), ('61', 2), ('62', 1), ('63', 6),
      ('64', 3), ('65', 2), ('66', 2), ('67', 2), ('68', 1)],
     [('63', 6), ('64', 3), ('65', 2), ('66', 2)],
     [('87', 1), ('63', 6), ('84', 1), ('86', 1)]]

How to do that?

Then I need to create a list of lists with sorted values of each row of "Shown ID" column (each list of result list of lists above).

So summary result must be:

[['63', '64', '60', '61', '65', '66', '67', '68', '59', '62'],
 ['63', '64', '65', '66'],
 ['63', '87', '84', '86']]

How Can I do that? If the numbers have the same frequency of occurrences, it needs to sort in ascending appearing in list (the earlier appeared in row, the more priority)

2

There are 2 answers

9
Vikash Singh On BEST ANSWER

This is how you can get what you are looking for:

import pandas as pd
from collections import Counter


data = [{'c_id' : [59,60,61,62,60,63,64,65,66,61,67,68,67]},
{'c_id' : [63,64,63,64,63,65,66]},
{'c_id' : [87,63,84,63,86]}]

df = pd.DataFrame.from_dict(data)

df['c_id'].apply(lambda val: [key for key,val in Counter(val).most_common()])

output:

0    [67, 60, 61, 64, 65, 66, 68, 59, 62, 63]
1                            [63, 64, 65, 66]
2                            [63, 84, 86, 87]

Values which have the same count might come in any order.

If you want to make column level counter then you can do it like this:

all_cids = []
for index, row in df.iterrows():
    all_cids.extend(row['c_id'])

import operator
counter_obj = Counter(all_cids)

def get_ordered_values(values):
    new_values = []
    covered_valeus = set()
    for val in values:
        if val in covered_valeus:
            continue
        covered_valeus.add(val)
        new_values.append((val, counter_obj[val]))    
    new_values.sort(key=operator.itemgetter(1), reverse=True)
    return [key for key, val in new_values]

df['c_id'].apply(lambda values: get_ordered_values(values))

output

0    [63, 64, 60, 61, 65, 66, 67, 59, 62, 68]
1                            [63, 64, 65, 66]
2                            [63, 84, 86, 87]
0
Danilo On

If i understand it completely , you want to find number of occurrences but not list of indexes where specific data is found. I can imagine several ways of doing this:

  1. way:, count the data.

If your data type is not the multidimensional list, then you can simple use count function in list object.

# in python3 you would need list(range(3)) etc to test this example
someList = range(3)+range(2)+range(1)

sortedElements = sorted(set(someList)) #> looses duplicates of elements, somelist must be hash-able

for x in sortedElements:
    # list.count(somelist,element) is usable for python2.7 and python3.5
    # tested myself on py interpreter, i can not say for IronPython and/or Rhino enviorment
    print( x, someList.count(x) ) # and there you will have element, and number of occurrences 
  1. Returning indexes of duplications:

    #somelist same as before
    #sortedElements same as before
    for x in sortedElements:
          lIndexes = [ someList.index(elem) for elem in sortedElements if elem == x] 
          print(lIndexes)
    
  2. Multidimensional list:

As i see it, you must first dump the whole list into 1 list or , do steps 1 or 2 on each child list of multidimensional list depending on your need.
Of course there is several way to transverse multidimensional list, you can map or filter or reduce or pass them as *arguments etc ( there are too many ways to transverse multi list for me to count, you can find most of the methods on this website ) but the method of your choosing is very tightly connected to your example. Without further explanation i would not dare to consult you since it could do more damage and good.

Hope this helps.