I have a polars dataframe that has a particular column with repeating patterns. I have grouped them by the patterns & adding a new column to this grouped dataframe. But now I have to unpack/ungroup this dataframe. How can I do it in polars?
My original dataframe looks like this:
| file | col1 | col2 |
|---|---|---|
| A | cell 1 | cell 2 |
| B | cell 3 | cell 4 |
| A | cell 5 | cell 6 |
| B | cell 7 | cell 8 |
I performed groupby to group the dataframe by FILE & then I added my desired new column & I got below output.
| file | col1 | col2 | folder |
|---|---|---|---|
| A | [cell 1, cell 5] | [cell 2, cell 6] | [file1, file2] |
| B | [cell 3, cell 7] | [cell 4, cell 8] | [file1, file2] |
Now I want to ungroup the above dataframe into original format while also including this new column. How can I do it? My actual dataframe is huge & has a lot of rows & columns, using iterations is not effective & quite slow. Is there any function that can be applied to the whole dataframe instead of iterating by columns?
Final desired output:
| file | header 1 | header 2 | folder |
|---|---|---|---|
| A | cell 1 | cell 2 | file1 |
| B | cell 3 | cell 4 | file1 |
| A | cell 5 | cell 6 | file2 |
| B | cell 7 | cell 8 | file2 |
I have done the following:
dfg = df.groupby('FILE').agg(pl.all()) #to group them first time
newdf = dfg.with_columns(pl.repeat([file1,file2,file3], dfg.height) #adding desired column
In what efficient ways can I get the desired output? Note that my dataframe is quite large, so using iterations by column is time consuming.
PS - Updated typo in the final table format. In the column "file" as the entries get repeated after few rows, they should be assigned a new "folder" name.
You can
explode:Your problem overall might be best solved by a
joinor some type ofoverexpression, though: