How to transform dataframe long to wide with "grouped" columns?

Question

How to transform dataframe long to wide with "grouped" columns?

78 views Asked by HedgeHog At 26 March 2024 at 16:15

When pivoting the following dataframe from long to wide, I would like to get "groups" of columns and mark them with a prefix or suffix.

The groups of elements can have different sizes, i.e. consist of one, two or more grouped elements/rows, I used pairs of two here to keep the example simple.

import pandas as pd

df = pd.DataFrame(
    [
        {'group': 'group-009297534',  'single_id': 'single-011900051',  'country': 'ESP',  'name': '00000911'},
        {'group': 'group-009297534',  'single_id': 'single-000000821',  'country': 'USA',  'name': '00001054'},
        {'group': 'group-009280053',  'single_id': 'single-000000002',  'country': 'HUN',  'name': '00000496'},
        {'group': 'group-009280053',  'single_id': 'single-000000014',  'country': 'HUN',  'name': '00000795'},
        {'group': 'group-009245039',  'single_id': 'single-000001258',  'country': 'NOR',  'name': '00000527'},
        {'group': 'group-009245039',  'single_id': 'single-000000669',  'country': 'TWN',  'name': '00000535'}
    ]
)

My approach of assigning an index to the elements to be grouped and then using this for the column designation is already going in the right direction, but still deviates from the expected view

df['idx'] = df.groupby('group').cumcount()
df.pivot(index='group', columns='idx')

group	('single_id', 0)	('single_id', 1)	('country', 0)	('country', 1)	('name', 0)	('name', 1)
group-009245039	single-000001258	single-000000669	NOR	TWN	00000527	00000535
group-009280053	single-000000002	single-000000014	HUN	HUN	00000496	00000795
group-009297534	single-011900051	single-000000821	ESP	USA	00000911	00001054

However, the expected solution would look like this:

	group	single_id_1	country_1	name_1	single_id_2	country_2	name_2
0	group-009245039	single-000001258	NOR	00000527	single-000000669	TWN	00000535
1	group-009280053	single-000000002	HUN	00000496	single-000000014	HUN	00000795
2	group-009297534	single-011900051	ESP	00000911	single-000000821	USA	00001054

I'm not sure whether the approach with the multi-index, which would then have to be sorted and merged somehow, is the right one or whether there is a more elegant option.

Original Q&A

There are 3 answers

DataSciRookie On 26 March 2024 at 16:27

Is it what you are looking for ?

import pandas as pd

df = pd.DataFrame([
    {'group': 'group-009297534',  'single_id': 'single-011900051',  'country': 'ESP',  'name': '00000911'},
    {'group': 'group-009297534',  'single_id': 'single-000000821',  'country': 'USA',  'name': '00001054'},
    {'group': 'group-009280053',  'single_id': 'single-000000002',  'country': 'HUN',  'name': '00000496'},
    {'group': 'group-009280053',  'single_id': 'single-000000014',  'country': 'HUN',  'name': '00000795'},
    {'group': 'group-009245039',  'single_id': 'single-000001258',  'country': 'NOR',  'name': '00000527'},
    {'group': 'group-009245039',  'single_id': 'single-000000669',  'country': 'TWN',  'name': '00000535'}
])

df['idx'] = (df.groupby('group').cumcount() + 1).astype(str)


# Pivot the DataFrame with the new 'idx' for differentiation
df_pivoted = df.pivot(index='group', columns='idx')

# Flatten the MultiIndex and format column names
df_pivoted.columns = [f'{x[0]}_{x[1]}' for x in df_pivoted.columns]

# Reset the index to bring 'group' back as a column
df_pivoted.reset_index(inplace=True)

# Optional: Reorder the columns according to your expected output
# This assumes you know the order and number of groups
expected_order = [
    'group', 
    'single_id_1', 'country_1', 'name_1',
    'single_id_2', 'country_2', 'name_2'
]
df_pivoted = df_pivoted[expected_order]

print(df_pivoted)

Output :

  group       single_id_1 country_1    name_1       single_id_2  \
0  group-009245039  single-000001258       NOR  00000527  single-000000669   
1  group-009280053  single-000000002       HUN  00000496  single-000000014   
2  group-009297534  single-011900051       ESP  00000911  single-000000821   

  country_2    name_2  
0       TWN  00000535  
1       HUN  00000795  
2       USA  00001054

Onyambu On 26 March 2024 at 19:58

You can use pivot_wider from janitor

import janitor
df['cols'] = df.groupby('group').cumcount() + 1
df.pivot_wider(index = 'group', names_from = 'cols')

            group      single_id_1      single_id_2 country_1 country_2 name_1 name_2
0 group-009245039 single-000001258 single-000000669       NOR       TWN    527    535
1 group-009280053 single-000000002 single-000000014       HUN       HUN    496    795
2 group-009297534 single-011900051 single-000000821       ESP       USA    911   1054

Equivalently you could use pivot_wider from siuba:

from siuba import _
from siuba.experimental.pivot import pivot_wider

df['cols'] = df.groupby('group').cumcount() + 1
pivot_wider(df, names_from = _.cols, 
                values_from = _['single_id', 'country', 'name'],
                names_vary = 'slowest')

            group      single_id_1 country_1 name_1      single_id_2 country_2 name_2
0 group-009245039 single-000001258       NOR    527 single-000000669       TWN    535
1 group-009280053 single-000000002       HUN    496 single-000000014       HUN    795
2 group-009297534 single-011900051       ESP    911 single-000000821       USA   1054

**Panda Kim** · Accepted Answer · 2024-03-26T18:01:35+00:00

Code

out = (df
       .assign(index=df.groupby('group').cumcount().add(1).astype('str'))
       .pivot(index='group', columns='index')
       .pipe(lambda x: x.set_axis(x.columns.map('_'.join), axis=1))
       .sort_index(axis=1, 
                   key=lambda x: x.str.extract(r'.+_(\d+$)')[0].astype('int'))
       .reset_index()
)

out

TechQA.

How to transform dataframe long to wide with "grouped" columns?

There are 3 answers

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in DATAFRAME

Related Questions in PIVOT

Popular Questions

Trending Questions