Perform some operation if 2 pandas dataframe have same entries in python

69 views Asked by At

I have 2 dataframes (purchase and sales) as follows:

PURCHASE:

Name item voucher Amt Qty
A Item1 Purchase 10000 100
B Item2 Purchase 500 50
B Item1 Purchase 2000 20
C Item3 Purchase 1000 100
D Item4 Purchase 500 100
A Item3 Purchase 5000 50

SALES:

Name item voucher Amt Qty
A Item1 Sales 5300 50
B Item2 Sales 450 40
B Item1 Sales 1675 15
C Item3 Sales 1800 100

I want an output dataframe where if the person (Name) sells an item, the Amt and Qty should be deducted from the purchase dataframe and a new dataframe should be created with the remaining Amt and Qty as shown below:

OUTPUT DATAFRAME:

Name item voucher Amt Qty
A Item1 Remaining 4700 50
A Item3 Remaining 5000 50
B Item2 Remaining 50 10
B Item1 Remaining 325 5
C Item3 Remaining -800 0
D Item4 Remaining 500 100

Notice that whatever items have been sold by a person (Name) has been deducted from the purchase dataframe and the remaining items (Amt and Qty) are stored in a new output dataframe. Also person D never sold any items even then it should be included in the output dataframe.

Thanks in advance!

Dataframe

import pandas as pd

Purchases = {
    "Name": ["A", "B", "B", "C", "D", "A"],
    "item": ["Item1", "Item2", "Item1", "Item3", "Item4", "Item3"],
    "voucher": ["Purchase", "Purchase", "Purchase", "Purchase", "Purchase", "Purchase"],
    "Amt": [10000, 500, 2000, 1000, 500, 5000],
    "Qty": [100, 50, 20, 100, 100, 50],
}

Purchases = pd.DataFrame(Purchases)

Sales = {
    "Name": ["A", "B", "B", "C"],
    "item": ["Item1", "Item2", "Item1", "Item3"],
    "voucher": ["Sales", "Sales", "Sales", "Sales"],
    "Amt": [5300, 450, 1675, 1800],
    "Qty": [50, 40, 15, 100],
}

Sales = pd.DataFrame(Sales)
3

There are 3 answers

3
mozway On BEST ANSWER

Using good old index alignment:

tmp = Purchases.set_index(['Name', 'item'])
out = (tmp
       .sub(Sales.set_index(['Name', 'item'])[['Amt', 'Qty']])
       .combine_first(tmp).assign(voucher='Remaining')
       .reset_index()[Purchases.columns]
      )

Output:

  Name   item    voucher     Amt    Qty
0    A  Item1  Remaining  4700.0   50.0
1    A  Item3  Remaining  5000.0   50.0
2    B  Item1  Remaining   325.0    5.0
3    B  Item2  Remaining    50.0   10.0
4    C  Item3  Remaining  -800.0    0.0
5    D  Item4  Remaining   500.0  100.0
3
Laurent B. On
Purchases = Purchases.set_index(['Name', 'item'])
Sales = Sales.set_index(['Name', 'item'])
Purchases['Amt'].update(Purchases['Amt'].sub(Sales['Amt']))
Purchases['Qty'].update(Purchases['Qty'].sub(Sales['Qty']))
Purchases = Purchases.reset_index().sort_values(by=['Name'])

print(Purchases)
  Name   item   voucher   Amt  Qty
0    A  Item1  Purchase  4700   50
5    A  Item3  Purchase  5000   50
1    B  Item2  Purchase    50   10
2    B  Item1  Purchase   325    5
3    C  Item3  Purchase  -800    0
4    D  Item4  Purchase   500  100
0
Corralien On

You can use merge:

# dfP = PURCHASE dataframe
# dfS = SALES dataframe
out = (Purchases.merge(Sales.drop(columns='voucher'), on=['Name', 'item'],
                 suffixes=(None, '_'), how='left')
                .assign(Amt=lambda x: x['Amt'] - x.pop('Amt_').fillna(0).astype(int),
                        Qty=lambda x: x['Qty'] - x.pop('Qty_').fillna(0).astype(int),
                        voucher='Remaining'))

Output:

>>> out
  Name   item    voucher   Amt  Qty
0    A  Item1  Remaining  4700   50
1    B  Item2  Remaining    50   10
2    B  Item1  Remaining   325    5
3    C  Item3  Remaining  -800    0
4    D  Item4  Remaining   500  100
5    A  Item3  Remaining  5000   50