Select column based on the value of another column Polars Python

171 views Asked by At

I have a df with ten columns and another column with its values are partial name of the ten columns. Here is a similar sample:

import polars as pl
df = pl.DataFrame({
    "ID"     :["A"  ,"B"  ,"C"  ] ,
    "A Left" :["W1" ,"W2" ,"W3" ] , 
    "A Right":["P1" ,"P2" ,"P3" ] , 
    "B Left" :["G1" ,"G2" ,"G3" ] , 
    "B Right":["Y1" ,"Y2" ,"Y3" ] , 
    "C Left" :["M1" ,"M2" ,"M3" ] , 
    "C Right":["K1" ,"K2" ,"K3" ] , 
    })
df
shape: (3, 7)
┌─────┬────────┬─────────┬────────┬─────────┬────────┬─────────┐
│ ID  ┆ A Left ┆ A Right ┆ B Left ┆ B Right ┆ C Left ┆ C Right │
│ --- ┆ ---    ┆ ---     ┆ ---    ┆ ---     ┆ ---    ┆ ---     │
│ str ┆ str    ┆ str     ┆ str    ┆ str     ┆ str    ┆ str     │
╞═════╪════════╪═════════╪════════╪═════════╪════════╪═════════╡
│ A   ┆ W1     ┆ P1      ┆ G1     ┆ Y1      ┆ M1     ┆ K1      │
│ B   ┆ W2     ┆ P2      ┆ G2     ┆ Y2      ┆ M2     ┆ K2      │
│ C   ┆ W3     ┆ P3      ┆ G3     ┆ Y3      ┆ M3     ┆ K3      │
└─────┴────────┴─────────┴────────┴─────────┴────────┴─────────┘

I want to add a column with its value selected from the other columns based on ID column like below:

shape: (3, 8)
┌─────┬────────┬─────────┬────────┬─────────┬────────┬─────────┬───────┐
│ ID  ┆ A Left ┆ A Right ┆ B Left ┆ B Right ┆ C Left ┆ C Right ┆ value │
│ --- ┆ ---    ┆ ---     ┆ ---    ┆ ---     ┆ ---    ┆ ---     ┆ ---   │
│ str ┆ str    ┆ str     ┆ str    ┆ str     ┆ str    ┆ str     ┆ str   │
╞═════╪════════╪═════════╪════════╪═════════╪════════╪═════════╪═══════╡
│ A   ┆ W1     ┆ P1      ┆ G1     ┆ Y1      ┆ M1     ┆ K1      ┆ W1-P1 │
│ B   ┆ W2     ┆ P2      ┆ G2     ┆ Y2      ┆ M2     ┆ K2      ┆ G2-Y2 │
│ C   ┆ W3     ┆ P3      ┆ G3     ┆ Y3      ┆ M3     ┆ K3      ┆ M3-K3 │
└─────┴────────┴─────────┴────────┴─────────┴────────┴─────────┴───────┘

I got this result using melt:

df.join( df.melt(id_vars='ID').with_columns(
            pl.when(pl.col("ID") == pl.col("variable").str.slice(0,1)).then(pl.col("value"))
        ).select(["ID" , "value"]).drop_nulls().group_by("ID").agg(pl.col('value').str.concat()) 
        ,on='ID').sort("ID")

However, I need to avoid melt because I have two groups of ten columns beside other 50 columns.

I have tried using pl.col() and polars.selectors but I couldn't get the result.

import polars.selectors as cs
df.with_columns(
    cs.by_name(
        ( pl.concat_str([pl.col('ID') , " Left"] ) )
        ).alias("value")
)
TypeError: ColumnFactory.__new__() missing 1 required positional argument: 'name'

Any suggested solution ?

Thanks in advance.

1

There are 1 answers

0
jqurious On BEST ANSWER

It looks like you want to extract the "base" of the Left/Right columns.

There are various ways you could do that:

columns = pl.Series(df.select("^.+ (Left|Right)$").columns)
columns = columns.str.extract("(\S+)").unique()
shape: (3,)
Series: '' [str]
[
    "A"
    "B"
    "C"
]

You could then use pl.coalesce() to create a single column of the chosen when/then values:

df.with_columns(value = 
   pl.coalesce(
      pl.when(pl.col("ID") == col)
        .then(pl.col(f"{col} Left") + "-" + pl.col(f"{col} Right"))
      for col in columns
   )
)
shape: (3, 8)
┌─────┬────────┬─────────┬────────┬─────────┬────────┬─────────┬───────┐
│ ID  ┆ A Left ┆ A Right ┆ B Left ┆ B Right ┆ C Left ┆ C Right ┆ value │
│ --- ┆ ---    ┆ ---     ┆ ---    ┆ ---     ┆ ---    ┆ ---     ┆ ---   │
│ str ┆ str    ┆ str     ┆ str    ┆ str     ┆ str    ┆ str     ┆ str   │
╞═════╪════════╪═════════╪════════╪═════════╪════════╪═════════╪═══════╡
│ A   ┆ W1     ┆ P1      ┆ G1     ┆ Y1      ┆ M1     ┆ K1      ┆ W1-P1 │
│ B   ┆ W2     ┆ P2      ┆ G2     ┆ Y2      ┆ M2     ┆ K2      ┆ G2-Y2 │
│ C   ┆ W3     ┆ P3      ┆ G3     ┆ Y3      ┆ M3     ┆ K3      ┆ M3-K3 │
└─────┴────────┴─────────┴────────┴─────────┴────────┴─────────┴───────┘