I have a df with ten columns and another column with its values are partial name of the ten columns. Here is a similar sample:
import polars as pl
df = pl.DataFrame({
"ID" :["A" ,"B" ,"C" ] ,
"A Left" :["W1" ,"W2" ,"W3" ] ,
"A Right":["P1" ,"P2" ,"P3" ] ,
"B Left" :["G1" ,"G2" ,"G3" ] ,
"B Right":["Y1" ,"Y2" ,"Y3" ] ,
"C Left" :["M1" ,"M2" ,"M3" ] ,
"C Right":["K1" ,"K2" ,"K3" ] ,
})
df
shape: (3, 7)
┌─────┬────────┬─────────┬────────┬─────────┬────────┬─────────┐
│ ID ┆ A Left ┆ A Right ┆ B Left ┆ B Right ┆ C Left ┆ C Right │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str │
╞═════╪════════╪═════════╪════════╪═════════╪════════╪═════════╡
│ A ┆ W1 ┆ P1 ┆ G1 ┆ Y1 ┆ M1 ┆ K1 │
│ B ┆ W2 ┆ P2 ┆ G2 ┆ Y2 ┆ M2 ┆ K2 │
│ C ┆ W3 ┆ P3 ┆ G3 ┆ Y3 ┆ M3 ┆ K3 │
└─────┴────────┴─────────┴────────┴─────────┴────────┴─────────┘
I want to add a column with its value selected from the other columns based on ID
column like below:
shape: (3, 8)
┌─────┬────────┬─────────┬────────┬─────────┬────────┬─────────┬───────┐
│ ID ┆ A Left ┆ A Right ┆ B Left ┆ B Right ┆ C Left ┆ C Right ┆ value │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str │
╞═════╪════════╪═════════╪════════╪═════════╪════════╪═════════╪═══════╡
│ A ┆ W1 ┆ P1 ┆ G1 ┆ Y1 ┆ M1 ┆ K1 ┆ W1-P1 │
│ B ┆ W2 ┆ P2 ┆ G2 ┆ Y2 ┆ M2 ┆ K2 ┆ G2-Y2 │
│ C ┆ W3 ┆ P3 ┆ G3 ┆ Y3 ┆ M3 ┆ K3 ┆ M3-K3 │
└─────┴────────┴─────────┴────────┴─────────┴────────┴─────────┴───────┘
I got this result using melt:
df.join( df.melt(id_vars='ID').with_columns(
pl.when(pl.col("ID") == pl.col("variable").str.slice(0,1)).then(pl.col("value"))
).select(["ID" , "value"]).drop_nulls().group_by("ID").agg(pl.col('value').str.concat())
,on='ID').sort("ID")
However, I need to avoid melt because I have two groups of ten columns beside other 50 columns.
I have tried using pl.col()
and polars.selectors
but I couldn't get the result.
import polars.selectors as cs
df.with_columns(
cs.by_name(
( pl.concat_str([pl.col('ID') , " Left"] ) )
).alias("value")
)
TypeError: ColumnFactory.__new__() missing 1 required positional argument: 'name'
Any suggested solution ?
Thanks in advance.
It looks like you want to extract the "base" of the Left/Right columns.
There are various ways you could do that:
You could then use
pl.coalesce()
to create a single column of the chosenwhen/then
values: