Pandas conditional join and calculation

79 views Asked by At

I have two Pandas dataframes, df_stock_prices and df_sentiment_mean.

I would like to do the following:

  1. Left join/merge these two dataframes into one dataframe, joined by Date and by ticker. In df_stock_prices, ticker is the column name, for example AAPL.OQ and in df_sentiment_mean ticker is found within the rows of the column named ticker.

  2. If there is a Date and ticker from df_stock_prices that doesn't match df_sentiment_mean, keep the non-matching row of df_stock_prices as-is (hence the left join).

  3. When there is a match for both Date and ticker, multiply the fields together; for example in the dataframes listed below, if df_stock_prices Date is 2021-11-29 and column AAPL.OQ is a match for the df_sentiment_mean Date of 2021-11-29 and ticker AAPL.OQ, then multiply the values for the match, in this example: 160.24 * 0.163266.

If a Date and ticker from df_stock_prices doesn't match a Date and ticker value from df_sentiment_mean, keep the values from df_stock_prices.

Current dataframes:

df_stock_prices:


            AAPL.OQ  ABBV.N   ABT.N   ACN.N  ADBE.OQ  AIG.N  AMD.OQ  AMGN.OQ  
Date                                                                           
2021-11-29   160.24  116.89  128.03  365.82   687.49  54.95  161.91   203.47   
2021-11-30   165.30  115.28  125.77  357.40   669.85  52.60  158.37   198.88   
2021-12-01   164.77  115.91  126.74  360.14   657.41  51.72  149.11   200.80   
2021-12-02   163.76  116.87  128.38  365.30   671.88  53.96  150.68   201.17   
2021-12-03   161.84  118.85  130.27  361.42   616.53  53.32  144.01   202.44   
...

df_sentiment_mean:

            ticker      diff
Date                         
2021-11-29  AAPL.OQ  0.163266
2021-11-29   ABBV.N -0.165520
2021-11-29    ABT.N  0.149920
2021-11-29  ADBE.OQ -0.014639
2021-11-29    AIG.N -0.448595
...             ...       ...
2023-01-12    LOW.N  0.008863
2023-01-12    MDT.N  0.498884
2023-01-12     MO.N -0.013428
2023-01-12    NEE.N  0.255223
2023-01-12    NKE.N  0.072752

Desired dataframe, partial first row example:

df_new:

            AAPL.OQ         ABBV.N      ABT.N        ACN.N     ADBE.OQ       AIG.N  …
Date
2021-11-29  26.16174384    -19.3476328  19.1942576  365.82    -10.06416611  -24.65029525  …
...
1

There are 1 answers

8
Corralien On

You have to reshape your second dataframe then multiply both dataframes. Finally, fill nan by original values of your first dataframe:

# df1 = df_stock_prices
# df2 = df_sentiment_mean
df_new = df1.mul(df2.set_index('ticker', append=True)['diff'].unstack('ticker')).fillna(df1)
>>> df1
            AAPL.OQ  ABBV.N   ABT.N   ACN.N  ADBE.OQ  AIG.N
Date                                                       
2021-11-29   160.24  116.89  128.03  365.82   687.49  54.95


>>> df2
             ticker      diff
Date                         
2021-11-29  AAPL.OQ  0.163266
2021-11-29   ABBV.N -0.165520
2021-11-29    ABT.N  0.149920
2021-11-29  ADBE.OQ -0.014639
2021-11-29    AIG.N -0.448595


>>> df_new
              AAPL.OQ     ABBV.N      ABT.N   ACN.N    ADBE.OQ      AIG.N
Date                                                                     
2021-11-29  26.161744 -19.347633  19.194258  365.82 -10.064166 -24.650295