How can I ungroup a polars dataframe in python?

Question

How can I ungroup a polars dataframe in python?

283 views Asked by megha At 16 August 2023 at 11:40

I have a polars dataframe that has a particular column with repeating patterns. I have grouped them by the patterns & adding a new column to this grouped dataframe. But now I have to unpack/ungroup this dataframe. How can I do it in polars?

My original dataframe looks like this:

file	col1	col2
A	cell 1	cell 2
B	cell 3	cell 4
A	cell 5	cell 6
B	cell 7	cell 8

I performed groupby to group the dataframe by FILE & then I added my desired new column & I got below output.

file	col1	col2	folder
A	[cell 1, cell 5]	[cell 2, cell 6]	[file1, file2]
B	[cell 3, cell 7]	[cell 4, cell 8]	[file1, file2]

Now I want to ungroup the above dataframe into original format while also including this new column. How can I do it? My actual dataframe is huge & has a lot of rows & columns, using iterations is not effective & quite slow. Is there any function that can be applied to the whole dataframe instead of iterating by columns?

Final desired output:

file	header 1	header 2	folder
A	cell 1	cell 2	file1
B	cell 3	cell 4	file1
A	cell 5	cell 6	file2
B	cell 7	cell 8	file2

I have done the following:

dfg = df.groupby('FILE').agg(pl.all())             #to group them first time 
newdf =  dfg.with_columns(pl.repeat([file1,file2,file3], dfg.height)    #adding desired column

In what efficient ways can I get the desired output? Note that my dataframe is quite large, so using iterations by column is time consuming.

PS - Updated typo in the final table format. In the column "file" as the entries get repeated after few rows, they should be assigned a new "folder" name.

Original Q&A

There are 2 answers

**Wayoshi** · Answer 1 · 2023-08-16T16:29:36+00:00

You can explode:

dfg.explode(pl.exclude('file'))

Your problem overall might be best solved by a join or some type of over expression, though:

df = pl.DataFrame(
    {
        'file': ['A', 'B'] * 2,
        'col1': [f'cell {i}' for i in range(1, 9, 2)],
        'col2': [f'cell {i}' for i in range(2, 9, 2)],
    }
)
df2 = pl.DataFrame({'file': ['A', 'B'], 'folder': ['file1', 'file2']})

df.join(df2, on='file')

shape: (4, 4)
┌──────┬────────┬────────┬────────┐
│ file ┆ col1   ┆ col2   ┆ folder │
│ ---  ┆ ---    ┆ ---    ┆ ---    │
│ str  ┆ str    ┆ str    ┆ str    │
╞══════╪════════╪════════╪════════╡
│ A    ┆ cell 1 ┆ cell 2 ┆ file1  │
│ B    ┆ cell 3 ┆ cell 4 ┆ file2  │
│ A    ┆ cell 5 ┆ cell 6 ┆ file1  │
│ B    ┆ cell 7 ┆ cell 8 ┆ file2  │
└──────┴────────┴────────┴────────┘

**jqurious** · Answer 2 · 2023-09-18T12:00:19+00:00

It looks like you're trying to "enumerate" each group.

You can use .cum_count() for that.

df = pl.from_repr("""
┌──────┬─────────┬──────────┐
│ file ┆ col1    ┆ col2     │
│ ---  ┆ ---     ┆ ---      │
│ str  ┆ str     ┆ str      │
╞══════╪═════════╪══════════╡
│ A    ┆ cell 1  ┆ cell 2   │
│ B    ┆ cell 3  ┆ cell 4   │
│ A    ┆ cell 5  ┆ cell 6   │
│ B    ┆ cell 7  ┆ cell 8   │
│ A    ┆ cell 9  ┆ cell 10  │
│ B    ┆ cell 11 ┆ cell 12  │
│ A    ┆ cell 13 ┆ cell 14  │
│ B    ┆ cell 15 ┆ cell 16  │
│ A    ┆ cell 17 ┆ cell 18  │
│ B    ┆ cell 19 ┆ cell 20  │
└──────┴─────────┴──────────┘
""")

df.with_columns(folder = 
   pl.col("file").cum_count().over("file")
)

shape: (10, 4)
┌──────┬─────────┬─────────┬────────┐
│ file ┆ col1    ┆ col2    ┆ folder │
│ ---  ┆ ---     ┆ ---     ┆ ---    │
│ str  ┆ str     ┆ str     ┆ u32    │
╞══════╪═════════╪═════════╪════════╡
│ A    ┆ cell 1  ┆ cell 2  ┆ 0      │
│ B    ┆ cell 3  ┆ cell 4  ┆ 0      │
│ A    ┆ cell 5  ┆ cell 6  ┆ 1      │
│ B    ┆ cell 7  ┆ cell 8  ┆ 1      │
│ …    ┆ …       ┆ …       ┆ …      │
│ A    ┆ cell 13 ┆ cell 14 ┆ 3      │
│ B    ┆ cell 15 ┆ cell 16 ┆ 3      │
│ A    ┆ cell 17 ┆ cell 18 ┆ 4      │
│ B    ┆ cell 19 ┆ cell 20 ┆ 4      │
└──────┴─────────┴─────────┴────────┘

You can turn it into a "repeating sequence" using modulo arithmetic.

df.with_columns(folder = 
   pl.col("file").cum_count().over("file").mod(3)
)

shape: (10, 4)
┌──────┬─────────┬─────────┬────────┐
│ file ┆ col1    ┆ col2    ┆ folder │
│ ---  ┆ ---     ┆ ---     ┆ ---    │
│ str  ┆ str     ┆ str     ┆ u32    │
╞══════╪═════════╪═════════╪════════╡
│ A    ┆ cell 1  ┆ cell 2  ┆ 0      │
│ B    ┆ cell 3  ┆ cell 4  ┆ 0      │
│ A    ┆ cell 5  ┆ cell 6  ┆ 1      │
│ B    ┆ cell 7  ┆ cell 8  ┆ 1      │
│ …    ┆ …       ┆ …       ┆ …      │
│ A    ┆ cell 13 ┆ cell 14 ┆ 0      │
│ B    ┆ cell 15 ┆ cell 16 ┆ 0      │
│ A    ┆ cell 17 ┆ cell 18 ┆ 1      │
│ B    ┆ cell 19 ┆ cell 20 ┆ 1      │
└──────┴─────────┴─────────┴────────┘

You can then .format() the string.

df.with_columns(folder = 
   pl.format("file{}", pl.col("file").cum_count().over("file").mod(3) + 1)
)

shape: (10, 4)
┌──────┬─────────┬─────────┬────────┐
│ file ┆ col1    ┆ col2    ┆ folder │
│ ---  ┆ ---     ┆ ---     ┆ ---    │
│ str  ┆ str     ┆ str     ┆ str    │
╞══════╪═════════╪═════════╪════════╡
│ A    ┆ cell 1  ┆ cell 2  ┆ file1  │
│ B    ┆ cell 3  ┆ cell 4  ┆ file1  │
│ A    ┆ cell 5  ┆ cell 6  ┆ file2  │
│ B    ┆ cell 7  ┆ cell 8  ┆ file2  │
│ …    ┆ …       ┆ …       ┆ …      │
│ A    ┆ cell 13 ┆ cell 14 ┆ file1  │
│ B    ┆ cell 15 ┆ cell 16 ┆ file1  │
│ A    ┆ cell 17 ┆ cell 18 ┆ file2  │
│ B    ┆ cell 19 ┆ cell 20 ┆ file2  │
└──────┴─────────┴─────────┴────────┘

TechQA.

How can I ungroup a polars dataframe in python?

There are 2 answers

Related Questions in PYTHON-3.X

Related Questions in PERFORMANCE

Related Questions in PYTHON-POLARS

Related Questions in ITERABLE-UNPACKING

Popular Questions

Trending Questions