I have two polars dataframes of the same length:
import polars as pl
df1 = pl.DataFrame({'a': [1, 2, 3], 'b': [4, None, None]})
df2 = pl.DataFrame({'b': [None, 5, None], 'c': [6, 7, 8]})
df1
┌─────┬──────┐
│ a ┆ b │
╞═════╪══════╡
│ 1 ┆ 4 │
│ 2 ┆ null │
│ 3 ┆ null │
└─────┴──────┘
df2
┌──────┬─────┐
│ b ┆ c │
╞══════╪═════╡
│ null ┆ 6 │
│ 5 ┆ 7 │
│ null ┆ 8 │
└──────┴─────┘
I want to add df2
to df1
in a way that the columns that already exist in df1
get updated with values from df2
, and the columns that are only in df2
get added to df1
:
┌─────┬──────┬─────┐
│ a ┆ b ┆ c │
╞═════╪══════╪═════╡
│ 1 ┆ 4 ┆ 6 │
│ 2 ┆ 5 ┆ 7 │
│ 3 ┆ null ┆ 8 │
└─────┴──────┴─────┘
The best I got is:
df1.update(df2).hstack(df2.select([c for c in df2.columns if c not in df1.columns]))
Is there a better way?
If you need to customize how the update works, here's a way to manually do the update so you can tweak certain aspects of it if you need to.
If you don't need to tweak the update then just use the built in way though.