**My goals: ** filter an inner dataframe based on the row (then the values) of an outer dataframe
Let's consider three dataframes:
- first a list of shop with detail about the shop:
Shop = pd.DataFrame({
'Shop_number': ['SH10','SH20', 'SH30','SH40' ,'SH50','SH60','SH70'],
'Shop': ['Chisinau','Milan', 'Paris','Madrid', 'Lyon','London', 'foo']
})
Shop

- then a dataframe containing the sales records:
Sales = pd.DataFrame({
'Shop_number': ['SH20', 'SH30', 'SH50','SH70'],
'Amount':[200,400,50,1000],
'Type' : 'Actual'
})
Sales
Which gives:

then a dataframe containing the budget sales records:
Budget = pd.DataFrame({
'Shop_number': ['SH10','SH20', 'SH30','SH40' ,'SH50','SH60','SH70'],
'Amount':[100,150,200,300,40,500,150],
'Type' : 'Budget'
})
print(Budget)

Let's summarize data as is:
base=pd.concat([Sales,Budget])
base
Then what I try to achieve is express Sales and Budget based on Shop table as:

(Shop
.assign(
Budget = lambda x :
base.loc[(base['Shop_number']=='SH20') & (base['Type']=='Budget')].loc[:,'Amount'].sum(),
Actual = lambda x :
base.loc[(base['Shop_number']=='SH20') & (base['Type']=='Actual')].loc[:,'Amount'].sum(),
var = lambda x :
x.Budget-x.Actual
)
)
Then it work as intended:

But when I try to make it dynamic then it fails:
(Shop
.assign(
Budget = lambda x :
base.loc[(base['Shop_number']==x.Shop_number) & (base['Type']=='Budget')].loc[:,'Amount'].sum(),
Actual = lambda x :
base.loc[(base['Shop_number']==x.Shop_number) & (base['Type']=='Actual')].loc[:,'Amount'].sum(),
var = lambda x :
x.Budget-x.Actual
)
)
and returns a trace back:
ValueError: Can only compare identically-labeled Series objects
My question is:
How would you do to make it work? Would you handle the purpose of this calculation like this or on another more optimal way?
I try what is in the description then it failed. Of course I tried some make around but it failed as well can't find the solution
The reason you got the error is because
baseandShopdon't have the same length (number of rows). When you use.assignwithlambda,xwill anonymously replace yourShopdataframe. In fact, the filtering can't be done in this case due to length mismatch (impossible comparison in general). You can try this, and you'll have the same issue.I suggest defining a function that do the filtering job for you. Assuming we have initialized all dataframe as you mentioned above :
Then you can call the function by applying it to your
groupbyand get the exact result :Output
Note that the
set_axisis just for renaming column names, you can change it and use anything else you want (like.rename).Hope this help.