Python Pandas Dataframe: group rows and reduce groups with custom function

645 views Asked by At

Afer joining two data frames:

left_dict = {                                                                                  
    'id1': [1,2,3,4,5],                                                                        
    'val1': [10,20,30,40,50],                                                                  
    'lft': ['a','b','c','d','e']                                                               
}                                                                                              

right_dict = {                                                                                 
    'id1': [1,7,3,4,8,1,3],                                                                    
    'val2': [100,700,300,400,800,110,330],                                                     
    'rgt': [1.1,2.2,3.3,4.4,5.5,6.6,7.7]                                                       
}                                                                                              

left = pd.DataFrame(left_dict)                                                                 
right = pd.DataFrame(right_dict)                                                               

r = pd.merge(left, right, how='outer', on='id1', indicator=False)

I get resulting data frame:

   id1  lft  val1  rgt   val2                                                                  
0  1.0    a  10.0  1.1  100.0                                                                  
1  1.0    a  10.0  6.6  110.0                                                                  
2  2.0    b  20.0  NaN    NaN                                                                  
3  3.0    c  30.0  3.3  300.0                                                                  
4  3.0    c  30.0  7.7  330.0                                                                  
5  4.0    d  40.0  4.4  400.0                                                                  
6  5.0    e  50.0  NaN    NaN                                                                  
7  7.0  NaN   NaN  2.2  700.0                                                                  
8  8.0  NaN   NaN  5.5  800.0                                                                  

Now I need to fold rows with the same 'id1', 'lft' and 'rgt' into one row with 'id1', 'lft', 'rgt', unchanged and add a new column 'xxx' to this data frame. Values in this column 'xxx' are computed with a function

def f(val1, val2):
    if math.isnan(val2):
        r = val1
    else:
        if math.isnan(val1):
            r = val2
    else:
        r = val1 * 2 + val2 * 3
    return r

So resulting data frame should be this:

   id1  lft  val1  rgt   val2 xxx                                                              
0  1.0    a  10.0  1.1  100.0 320.0                                                            
1  2.0    b  20.0  NaN    NaN 20.0                                                             
2  3.0    c  30.0  3.3  300.0 960.0                                                            
3  4.0    d  40.0  4.4  400.0 40.0                                                             
4  5.0    e  50.0  NaN    NaN 50.0                                                             
5  7.0  NaN   NaN  2.2  700.0 700.0                                                            
6  8.0  NaN   NaN  5.5  800.0 800.0                                                            

I was trying to use:

In [85]: r.groupby(['id1','val1', 'lft', 'rgt']).groups

This returns a dictionary with values equal to row numbers in groups, which does not help at all. Any ideas how to actually fold and reduce rows?

1

There are 1 answers

0
Ezer K On
r['xxx'] = [f(x[1]['val1'],x[1]['val2']) for x in r.iterrows()]

will probably work, but keep in mind that for duplicated combinations you will get duplicated rows, is that the logic you are looking for?