I have a LazyFrame of time, symbols and mid_price:
Example:
time symbols mid_price
datetime[ns] str f64
2024-03-01 00:01:00 "PERP_SOL_USDT@… 126.1575
2024-03-01 00:01:00 "PERP_WAVES_USD… 2.71235
2024-03-01 00:01:00 "SOL_USDT@BINAN… 126.005
2024-03-01 00:01:00 "WAVES_USDT@BIN… 2.7085
2024-03-01 00:02:00 "PERP_SOL_USDT@… 126.3825
I want to perform some aggregations over the time dimension (ie: group by symbol):
aggs = (
df
.group_by('symbols')
.agg([
pl.col('mid_price').diff(1).alias("change"),
])
)
I get back a list of each value per unique symbols value:
>>> aggs.head().collect()
symbols change
str list[f64]
"SOL_USDT@BINAN… [null, 0.25, … -0.55]
"PERP_SOL_USDT@… [null, 0.225, … -0.605]
"WAVES_USDT@BIN… [null, -0.002, … -0.001]
"PERP_WAVES_USD… [null, -0.00255, … 0.0001]
I would now like to join this back onto my original dataframe:
df = df.join(
aggs,
on='symbols',
how='left',
)
This now results in each row getting the full list of change, rather then the respective value.
>>> df.head().collect()
time symbols mid_price change
datetime[ns] str f64 list[f64]
2024-03-01 00:01:00 "PERP_SOL_USDT@… 126.1575 [null, 0.225, … -0.605]
2024-03-01 00:01:00 "PERP_WAVES_USD… 2.71235 [null, -0.00255, … 0.0001]
2024-03-01 00:01:00 "SOL_USDT@BINAN… 126.005 [null, 0.25, … -0.55]
2024-03-01 00:01:00 "WAVES_USDT@BIN… 2.7085 [null, -0.002, … -0.001]
2024-03-01 00:02:00 "PERP_SOL_USDT@… 126.3825 [null, 0.225, … -0.605]
I have 2 questions please:
- How do I unstack/explode the lists returned from my
group_bywhen joining them back into the original dataframe? - Is this the recommended way to add a new column to my original dataframe from a
group_by(that is:group_byfollowed byjoin)?
It sounds like you don't want to actually aggregate anything (and get a single value per symbol), but instead want to compute
"change"but independently for each symbol.In polars, this kind of behaviour, similar to window functions in PostgreSQL, can be achieved with
pl.Expr.over.On some example data, the resolt looks as follows.