Sorry for the noob question.
I have a bunch of minute data like so:
timestamp open high low close volume trade_count vwap symbol volume_10_day
0 2021-06-28 11:15:00+00:00 1.8899 1.89 1.8899 1.8900 1200 2 1.889942 AADI 102027.6
1 2021-06-28 13:15:00+00:00 1.8200 1.82 1.8100 1.8100 710 8 1.818282 AADI 102027.6
2 2021-06-28 13:30:00+00:00 1.8600 1.87 1.8500 1.8600 13845 52 1.858409 AADI 102027.6
3 2021-06-28 13:35:00+00:00 1.8659 1.87 1.8626 1.8700 860 5 1.867670 AADI 102027.6
4 2021-06-28 13:40:00+00:00 1.8700 1.87 1.8477 1.8477 7386 33 1.859019 AADI 102027.6
... ... ... ... ... ... ... ... ... ... ...
423069 2021-01-13 00:35:00+00:00 1.2600 1.27 1.2500 1.2600 242391 140 1.260218 ZOM 122065.8
423070 2021-01-13 00:40:00+00:00 1.2600 1.26 1.2500 1.2600 129074 108 1.256255 ZOM 122065.8
423071 2021-01-13 00:45:00+00:00 1.2500 1.26 1.2500 1.2500 297198 151 1.253695 ZOM 122065.8
423072 2021-01-13 00:50:00+00:00 1.2600 1.26 1.2500 1.2600 223822 121 1.256325 ZOM 122065.8
423073 2021-01-13 00:55:00+00:00 1.2600 1.26 1.2500 1.2600 378248 222 1.255110 ZOM 122065.8
[423074 rows x 10 columns]
timestamp open high low close volume trade_count vwap symbol volume_10_day
2021-07-26 13:00:00+00:00 2.7799 2.78 2.68 2.7 182448 285 2.712106 SOS 6552.7
2021-07-26 13:05:00+00:00 2.7 2.71 2.68 2.69 46906 154 2.692083 SOS 6552.7
I am then grouping them by their symbols, and adding EMAs to them:
ema72 = lambda x: ta.ema(df.loc[x.index, "close"], 72)
ema89 = lambda x: ta.ema(df.loc[x.index, "close"], 89)
ema216 = lambda x: ta.ema(df.loc[x.index, "close"], 216)
ema267 = lambda x: ta.ema(df.loc[x.index, "close"], 267)
ema200 = lambda x: ta.ema(df.loc[x.index, "close"], 200)
df["EMA72"] = df.groupby(['symbol']).apply(ema72).reset_index(0,drop=True)
df["EMA89"] = df.groupby(['symbol']).apply(ema89).reset_index(0,drop=True)
df["EMA216"] = df.groupby(['symbol']).apply(ema216).reset_index(0,drop=True)
df["EMA267"] = df.groupby(['symbol']).apply(ema267).reset_index(0,drop=True)
df["EMA200"] = df.groupby(['symbol']).apply(ema200).reset_index(0,drop=True)
Which gives the error:
raise ValueError("cannot reindex on an axis with duplicate labels")
ValueError: cannot reindex on an axis with duplicate labels
What am I doing wrong here?
UPDATE
I can add the 72 and 89 ema fine by doing:
df["EMA72"] = df.groupby(['symbol']).apply(ema72).reset_index(0,drop=True)
df = df[~df.index.duplicated()]
df["EMA89"] = df.groupby(['symbol']).apply(ema89).reset_index(0,drop=True)
df = df[~df.index.duplicated()]
Will result in:
timestamp open high low close volume trade_count vwap symbol volume_10_day EMA72 EMA89
0 2021-07-06 13:30:00+00:00 1.7200 1.7300 1.71 1.730 23342 54 1.723628 AADI 102027.6 NaN NaN
1 2021-07-06 13:35:00+00:00 1.7300 1.7300 1.72 1.720 13225 26 1.729698 AADI 102027.6 NaN NaN
2 2021-07-06 13:40:00+00:00 1.7199 1.7199 1.71 1.715 1740 12 1.712175 AADI 102027.6 NaN NaN
3 2021-07-06 13:45:00+00:00 1.7150 1.7177 1.71 1.710 1223 9 1.714411 AADI 102027.6 NaN NaN
4 2021-07-06 13:50:00+00:00 1.7100 1.7100 1.71 1.710 900 1 1.710000 AADI 102027.6 NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ...
247074 2021-01-13 00:35:00+00:00 1.2600 1.2700 1.25 1.260 242391 140 1.260218 ZOM 122065.8 1.264493 1.258082
247075 2021-01-13 00:40:00+00:00 1.2600 1.2600 1.25 1.260 129074 108 1.256255 ZOM 122065.8 1.264370 1.258125
247076 2021-01-13 00:45:00+00:00 1.2500 1.2600 1.25 1.250 297198 151 1.253695 ZOM 122065.8 1.263976 1.257944
247077 2021-01-13 00:50:00+00:00 1.2600 1.2600 1.25 1.260 223822 121 1.256325 ZOM 122065.8 1.263867 1.257990
247078 2021-01-13 00:55:00+00:00 1.2600 1.2600 1.25 1.260 378248 222 1.255110 ZOM 122065.8 1.263761 1.258035
[247079 rows x 12 columns]
But doing this:
df["EMA72"] = df.groupby(['symbol']).apply(ema72).reset_index(0,drop=True)
df = df[~df.index.duplicated()]
df["EMA89"] = df.groupby(['symbol']).apply(ema89).reset_index(0,drop=True)
df = df[~df.index.duplicated()]
df["EMA216"] = df.groupby(['symbol']).apply(ema216).reset_index(0,drop=True)
df = df[~df.index.duplicated()]
df["EMA267"] = df.groupby(['symbol']).apply(ema267).reset_index(0,drop=True)
df = df[~df.index.duplicated()]
df["EMA200"] = df.groupby(['symbol']).apply(ema200).reset_index(0,drop=True)
df = df[~df.index.duplicated()]
Gives the error:
raise ValueError("cannot reindex on an axis with duplicate labels")
ValueError: cannot reindex on an axis with duplicate labels
I have tried removing the duplicated timestamps for each symbol:
df = df.groupby(['symbol', 'timestamp'])
df = df.last().sort_index().reset_index()
247079 // before
233228 // after
But it still gives the same error
I thought it may be something to do with the functions but when I do this:
df["EMA200"] = df.groupby('symbol')['close'].rolling(200).mean()
The error persists