How to update and concat at the same time in polars?

50 views Asked by At

I have two polars dataframes of the same length:

import polars as pl

df1 = pl.DataFrame({'a': [1, 2, 3], 'b': [4, None, None]})
df2 = pl.DataFrame({'b': [None, 5, None], 'c': [6, 7, 8]})

df1
┌─────┬──────┐
│ a   ┆ b    │
╞═════╪══════╡
│ 1   ┆ 4    │
│ 2   ┆ null │
│ 3   ┆ null │
└─────┴──────┘

df2
┌──────┬─────┐
│ b    ┆ c   │
╞══════╪═════╡
│ null ┆ 6   │
│ 5    ┆ 7   │
│ null ┆ 8   │
└──────┴─────┘

I want to add df2 to df1 in a way that the columns that already exist in df1 get updated with values from df2, and the columns that are only in df2 get added to df1:

┌─────┬──────┬─────┐
│ a   ┆ b    ┆ c   │
╞═════╪══════╪═════╡
│ 1   ┆ 4    ┆ 6   │
│ 2   ┆ 5    ┆ 7   │
│ 3   ┆ null ┆ 8   │
└─────┴──────┴─────┘

The best I got is:

df1.update(df2).hstack(df2.select([c for c in df2.columns if c not in df1.columns]))

Is there a better way?

1

There are 1 answers

0
Dean MacGregor On

If you need to customize how the update works, here's a way to manually do the update so you can tweak certain aspects of it if you need to.

overlaps = set(df1.columns).intersection(df2.columns)
(
    df1
    .with_columns(
        df2.rename({x:f"{x}_update" for x in overlaps}).to_struct('df2')
        )
    .unnest('df2')
    .with_columns(**{x:pl.coalesce(f"{x}_update",x) for x in overlaps})
    .select(set(df1.columns+ df2.columns))
    )

If you don't need to tweak the update then just use the built in way though.