I'm working on an anomaly detection model for CPU data and my current dataframes are structured like this (with 10k+ servers and 168 hours):
Server | Hour1 | Hour2 | Hour3 |
---|---|---|---|
server1 | val1.1 | val1.2 | val1.3 |
server2 | val2.1 | val 2.2 | val2.3 |
I need it to be in a structure like this:
Server | time | Value |
---|---|---|
server1 | 0 | value0 |
server1 | 1 | value1 |
server2 | 0 | value0 |
server2 | 1 | value1 |
The problem is there are 10k+ servers and 168 hourly values for each, so it's taking an eternity to iterate over. Is there a more efficient way to do this transformation?
My current attempt is creating a new df with nested for loops like so:
for index, row in df.iterrows():
for n in range(0,167):
new_df.loc[len(new_df.index)] = row
new_df.iat[len(new_df.index)-1, 2] = n
for index, row in new_df.iterrows():
for i, r in df.iterrows():
new_df_ts = row[2]
if(row[0] == r[0]):
new_df.iat[index, 3] = df.iat[i, 2 + new_df_ts]
Use
wide_to_long
which results in
Alternatively, use
melt
and then strip theHour
text out of theTime
column.Then do whatever additional processing is required on the
Time
andValue
columns, such as subtracting 1 from the time period. Usedf.replace
ordf["Value"].str.replace
to alter values if needed.