Pandas Dataframe (Object) - Count occurrence of values and and group in columns

58 views Asked by At

in the Dataframe I have the field Equipment. In this field are comma sepperated values. Now I like to count the occurrence of each value and create a column based on this value. For example:

Col1 ABS
aaa 1
bbb 1
ccc 0
ddd 0
eee 0
import pandas as pd

data = {'Equipment': ['ABS, Android, Zentralverriegelung', 'Xenon, ABS, Apple', 'Android, Hupe, Blinker', 'Radio, CD', '', 'ABS, Android, Zentralverriegelung'],
        'Col1': ['aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa'],
        'Col2': ['bbb', 'bbb', 'bbb', 'bbb', 'bbb', 'bbb'],
        'Col3': ['ccc', 'ccc', 'ccc', 'ccc', 'ccc', 'ccc'],
        'Col4': ['ddd', 'ddd', 'ddd', 'ddd', 'ddd', 'ddd'],
        'Col5': ['eee', 'eee', 'eee', 'eee', 'eee', 'eee'],
        }

df = pd.DataFrame(data)


split_data = df['Equipment'].str.split(',', expand=True)
split_data['index'] = df.index
result_data = pd.get_dummies(split_data.explode('index')).groupby(level=0).agg(sum)

print(result_data)

My Result is: enter image description here

But I was expecting something like this: enter image description here

BR

2

There are 2 answers

2
Corralien On BEST ANSWER

Maybe you are looking for str.get_dummies:

>>> df['Equipment'].str.get_dummies(sep=', ')

   ABS  Android  Apple  Blinker  CD  Hupe  Radio  Xenon  Zentralverriegelung
0    1        1      0        0   0     0      0      0                    1
1    1        0      1        0   0     0      0      1                    0
2    0        1      0        1   0     1      0      0                    0
3    0        0      0        0   1     0      1      0                    0
4    0        0      0        0   0     0      0      0                    0
5    1        1      0        0   0     0      0      0                    1
0
mozway On

If you really want to count all occurrences (i.e have 2 for ABS, ABS):

out = df.join(df['Equipment']
 .str.split(r',\s*').explode()
 .groupby(level=0).value_counts()
 .unstack(fill_value=0).drop(columns='', errors='ignore')
 )

Or:

out = df.join(pd.get_dummies(df['Equipment']
                             .str.split(r',\s*').explode())
                .groupby(level=0).sum()
              )

Output:

                           Equipment Col1 Col2 Col3 Col4 Col5  ABS  Android  \
0  ABS, Android, Zentralverriegelung  aaa  bbb  ccc  ddd  eee    1        1   
1                  Xenon, ABS, Apple  aaa  bbb  ccc  ddd  eee    1        0   
2             Android, Hupe, Blinker  aaa  bbb  ccc  ddd  eee    0        1   
3                          Radio, CD  aaa  bbb  ccc  ddd  eee    0        0   
4                                     aaa  bbb  ccc  ddd  eee    0        0   
5  ABS, Android, Zentralverriegelung  aaa  bbb  ccc  ddd  eee    1        1   
   Apple  Blinker  CD  Hupe  Radio  Xenon  Zentralverriegelung  
0      0        0   0     0      0      0                    1  
1      1        0   0     0      0      1                    0  
2      0        1   0     1      0      0                    0  
3      0        0   1     0      1      0                    0  
4      0        0   0     0      0      0                    0  
5      0        0   0     0      0      0                    1