Hy!
I need to unstack some columns in polars, but i want to group the columns by column 'a'
seq_ = df['a'].unique()
l = len(seq_)
dfs = []
for idx, a in enumerate(seq_):
df_tmp = df.filter(pl.col('a') == a)
steps = df_tmp.shape[0]
df_tmp = df_tmp.unstack(step=steps, how="horizontal",columns=['b', 'c'])
df_tmp = df_tmp.with_columns(pl.lit(a).alias('a'))
dfs.append(df_tmp)
df_tmp = pl.concat(dfs, how='diagonal')
this code is too much slower. Is there any way to do this in polars more faster?
from this:
┌─────┬───────┬──────┐
│ a ┆ b ┆ c │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str │
╞═════╪═══════╪══════╡
│ 1 ┆ a ┆ f │
├╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2 ┆ b ┆ g │
├╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 1 ┆ c ┆ h │
├╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 1 ┆ d ┆ i │
├╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2 ┆ e ┆ j │
└─────┴───────┴──────┘
to this:
┌─────┬───────┬──────┬─────┬──────┬──────┬──────┐
│ a ┆ b_0 ┆ b_1 ┆ b_2 ┆ c_0 ┆ c_1 ┆ c_2 │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str │
╞═════╪═══════╪══════╪═════╪══════╪══════╪══════╡
│ 1 ┆ a ┆ c ┆ d ┆ f ┆ h ┆ i │
├╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2 ┆ b ┆ e ┆null ┆ g ┆ j ┆ null │
└─────┴───────┴──────┴─────┴──────┴──────┴──────┘
group_by
withall
aggregation will transform theb
andc
columns into lists.list.to_struct()
, you can name each element with a custom field name, and it will be padded out withnull
s at this time per row. (The reason there'sdf = ...
steps below instead of all in one step is due to the wayfields
works when passing in a late-binding lambda, run into duplicate field issues)unnest
makes eachfield
of astruct
column its own column.There might be an alternate, cleaner solution with
pivot
, but at a glance I don't think this quite fits the mold of apivot
.