Pandas df manipulation: new column with list of values if other column rows repeated

Question

Pandas df manipulation: new column with list of values if other column rows repeated

1.5k views Asked by mik.ferrucci At 20 December 2016 at 15:32

I have a df like this:

ID   Cluster Product 
 1         4     'b'  
 1         4     'f'
 1         4     'w'
 2         7     'u'
 2         7     'b'
 3         5     'h'
 3         5     'f'
 3         5     'm'
 3         5     'd'
 4         7     's'
 4         7     'b'
 4         7     'g'

Where ID is the primary and unique key of another df that is the source for this df. Cluster is not a key, different IDs often have same Cluster value; anyway it's an information I have to carry on.

What I want to obtain is this dataframe:

ID   Cluster    Product_List_by_ID 
 1         4     ['b','f','w'] 
 2         7     ['u','b']
 3         5     ['h','f','m','d']
 4         7     ['s','b','g']

If this is not possible, also a dictionary like this could be fine:

d = {ID:[1,2,3,4], Cluster:[4,7,5,7], 
     Product_List_by_ID:[['b','f','w'],['u','b'],['h','f','m','d'],['s','b','g']]}

I have tried many ways unsuccessfully.. it seems that it is not possible to insert lists as pandas dataframe values.. Anyway I think it should not be so difficult to get the goal in some tricky way.. Sorry if I am going out of mind, but I am new to coding

Any suggests?! Thanks

Original Q&A

There are 2 answers

jezrael On 20 December 2016 at 16:16

Another solution is first remove ' from column Product if necessary by str.strip:

df.Product = df.Product.str.strip("'")

And then groupby with apply, last if need dictionary use to_dict with parameter orient='list'

print (df.groupby(['ID', 'Cluster'])
         .Product.apply(lambda x: x.tolist())
         .reset_index()
         .to_dict(orient='list'))

{'Cluster': [4, 7, 5, 7], 
'ID': [1, 2, 3, 4], 
'Product': [['b', 'f', 'w'], ['u', 'b'], 
            ['h', 'f', 'm', 'd'], ['s', 'b', 'g']]}

**piRSquared** · Accepted Answer · 2016-12-20T15:35:12+00:00

piRSquared On 20 December 2016 at 15:35 BEST ANSWER

use groupby

df.groupby(['ID', 'Cluster']).Product.apply(list)

ID  Cluster
1   4               ['b', 'f', 'w']
2   7                    ['u', 'b']
3   5          ['h', 'f', 'm', 'd']
4   7               ['s', 'b', 'g']
Name: Product, dtype: object

TechQA.

Pandas df manipulation: new column with list of values if other column rows repeated

There are 2 answers

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in DATAFRAME

Related Questions in GROUP-BY

Related Questions in PANDAS-GROUPBY

Popular Questions

Popular Tags

Trending Questions