I have a dataset as follows:
import pandas as pd
df = pd.DataFrame(
{"Time": [1,2,3,1,2,3],
"Prop1": ["A","B","A","A","B","B"],
"Prop2": [4, 5, 1, 5, 4, 3]})
| Time | Prop1 | Prop2 |
|---|---|---|
| 1 | A | 4 |
| 2 | B | 5 |
| 3 | A | 1 |
| 1 | A | 5 |
| 2 | B | 4 |
| 3 | B | 3 |
What I want is the differences...
- At each time step
- For Prop1 value
- between Prop2
From this dataset I'd like to receive something like...
| Time | Prop1 | Prop2 | Prop2Diff |
|---|---|---|---|
| 1 | A | 4 | NaN |
| 2 | B | 5 | NaN |
| 3 | A | 1 | NaN |
| 1 | A | 5 | (5-4)= 1 |
| 2 | B | 4 | (4-5)= -1 |
| 3 | B | 3 | NaN |
I know there is .diff() function, but I am uncertain how to apply the type of conditions I'm wanting.
This is in an effort to remove deep loops from my scripts, I have already done:
time = df["Time"].unique()
for cTime in time:
for prop1 in df[df["Time"] == cTime,"Prop1"].unique():
df.loc[(df["Time"]==cTime) & (df["Prop1"] == prop1] = cDF["Prop2"][1] - cDF["Prop2"][0]
And other variations of the loops.
This is operating on very large datasets, so every loop I can remove tremendously helps.
Code
df