Pivot or Melt a Pandas DataFrame

65 views Asked by At

I have a DataFrame that has the following columns: 'Series Name', 'Country Name', '2000', '20001' , ..., '2018'. In 'Series Name' I have the variables that I want to analize (for example, 'PBI', 'Surface Area', etc). So, in 'Series Name' I have repeated values (PBI for each Country, for example).

Original DataFrame

Series Name Country Name 2000 [YR2000] 2001 [YR2001]
Superficie (kilómetros cuadrados) Estados Unidos 9632030 9632030
Superficie (kilómetros cuadrados) Unión Europea 4384964,995 4384984,995
Superficie (kilómetros cuadrados) Japón 377800 377880

I want to reorder the DataFrame in this way: - Columns: 'Country Name', 'Year' and all the unique values of 'Series Name' - Year column: take the the values from 2000 to 2018 for each Country - 'Series Name' columns : each column ('PBI', 'Surface' etc) has the value for the respective Year and Country.

Country Name Year Superficie (kilómetros cuadrados)
Estados Unidos 2000 9632030
Estados Unidos 2001 9632030
Unión Europea 2000 4384964,995
Unión Europea 2001 4384964,995
Japón 2000 377800
Japón 2001 377800

I tried with melt, but I really don´t know how to use it:

indicadores_clean_pv = pd.melt(indicadores_clean, id_vars=['Country Name'], var_name='Año')

What I Tried

I´ve also tried :

indicadores_clean_pv.melt(id_vars='Country Name', var_name='Year').groupby(['Country Name','Year']).agg({'Series Name':'first'}).reset_index()

But I can't put the 'Series Name' values in the columns

Can anyone help me? Thanks in advance!

2

There are 2 answers

0
Corralien On

You can use pivot_table:

out = (df.pivot_table(index='Country Name', columns='Series Name', 
                      aggfunc='first', sort=False)
         .rename_axis(columns=['Year', None])
         .stack('Year').reset_index()
         .assign(Year=lambda x: x['Year'].str.split().str[0]))

Output:

>>> out
     Country Name  Year Superficie (kilómetros cuadrados)
0  Estados Unidos  2000                           9632030
1  Estados Unidos  2001                           9632030
2   Unión Europea  2000                       4384964,995
3   Unión Europea  2001                       4384984,995
4           Japón  2000                            377800
5           Japón  2001                            377880
1
Andrej Kesely On

You can try simple .stack()/.unstack():

out = (
    df.set_index(["Country Name", "Series Name"])
    .stack()
    .unstack(level=1)
    .rename_axis(columns=None, index=["Country Name", "Year"])
    .reset_index()
)

print(out)

Prints:

     Country Name  Year Superficie (kilómetros cuadrados)
0  Estados Unidos  2000                           9632030
1  Estados Unidos  2001                           9632030
2           Japón  2000                            377800
3           Japón  2001                            377880
4   Unión Europea  2000                       4384964,995
5   Unión Europea  2001                       4384984,995