Afer joining two data frames:
left_dict = {
'id1': [1,2,3,4,5],
'val1': [10,20,30,40,50],
'lft': ['a','b','c','d','e']
}
right_dict = {
'id1': [1,7,3,4,8,1,3],
'val2': [100,700,300,400,800,110,330],
'rgt': [1.1,2.2,3.3,4.4,5.5,6.6,7.7]
}
left = pd.DataFrame(left_dict)
right = pd.DataFrame(right_dict)
r = pd.merge(left, right, how='outer', on='id1', indicator=False)
I get resulting data frame:
id1 lft val1 rgt val2
0 1.0 a 10.0 1.1 100.0
1 1.0 a 10.0 6.6 110.0
2 2.0 b 20.0 NaN NaN
3 3.0 c 30.0 3.3 300.0
4 3.0 c 30.0 7.7 330.0
5 4.0 d 40.0 4.4 400.0
6 5.0 e 50.0 NaN NaN
7 7.0 NaN NaN 2.2 700.0
8 8.0 NaN NaN 5.5 800.0
Now I need to fold rows with the same 'id1', 'lft' and 'rgt' into one row with 'id1', 'lft', 'rgt', unchanged and add a new column 'xxx' to this data frame. Values in this column 'xxx' are computed with a function
def f(val1, val2):
if math.isnan(val2):
r = val1
else:
if math.isnan(val1):
r = val2
else:
r = val1 * 2 + val2 * 3
return r
So resulting data frame should be this:
id1 lft val1 rgt val2 xxx
0 1.0 a 10.0 1.1 100.0 320.0
1 2.0 b 20.0 NaN NaN 20.0
2 3.0 c 30.0 3.3 300.0 960.0
3 4.0 d 40.0 4.4 400.0 40.0
4 5.0 e 50.0 NaN NaN 50.0
5 7.0 NaN NaN 2.2 700.0 700.0
6 8.0 NaN NaN 5.5 800.0 800.0
I was trying to use:
In [85]: r.groupby(['id1','val1', 'lft', 'rgt']).groups
This returns a dictionary with values equal to row numbers in groups, which does not help at all. Any ideas how to actually fold and reduce rows?
will probably work, but keep in mind that for duplicated combinations you will get duplicated rows, is that the logic you are looking for?