Is there a more efficient way to reformat a pandas dataframe?

95 views Asked by At

I'm working on an anomaly detection model for CPU data and my current dataframes are structured like this (with 10k+ servers and 168 hours):

Server Hour1 Hour2 Hour3
server1 val1.1 val1.2 val1.3
server2 val2.1 val 2.2 val2.3

I need it to be in a structure like this:

Server time Value
server1 0 value0
server1 1 value1
server2 0 value0
server2 1 value1

The problem is there are 10k+ servers and 168 hourly values for each, so it's taking an eternity to iterate over. Is there a more efficient way to do this transformation?

My current attempt is creating a new df with nested for loops like so:

for index, row in df.iterrows():
    for n in range(0,167):
        new_df.loc[len(new_df.index)] = row
        new_df.iat[len(new_df.index)-1, 2] = n

for index, row in new_df.iterrows():
    for i, r in df.iterrows():
        new_df_ts = row[2]
        if(row[0] == r[0]):
            new_df.iat[index, 3] = df.iat[i, 2 + new_df_ts]


1

There are 1 answers

0
Stuart On BEST ANSWER

Use wide_to_long

import pandas as pd
import io
df = pd.read_csv(io.StringIO("""
Server  Hour1   Hour2   Hour3
server1 val1.1  val1.2  val1.3
server2 val2.1  val2.2  val2.3"""), sep="\t")
df = pd.wide_to_long(df, "Hour", "Server", "Time") \
    .rename(columns={"Hour": "Value"}) \
    .reset_index()

which results in

    Server  Time    Value
0  server1     1   val1.1
1  server2     1   val2.1
2  server1     2   val1.2
3  server2     2   val2.2
4  server1     3   val1.3
5  server2     3   val2.3

Alternatively, use melt and then strip the Hour text out of the Time column.

df = df.melt("Server", var_name="Time")
df["Time"] = df["Time"].str.strip("Hour").astype(int)

Then do whatever additional processing is required on the Time and Value columns, such as subtracting 1 from the time period. Use df.replace or df["Value"].str.replace to alter values if needed.