Pandas: how to reorder dataframe

77 views Asked by At

Example:

data={'P1_1': ['1', '6', '5','8', '4', '7', '5', '7', '1', '7', '3', '2', '1', '4', '7', '5', '7', '1'],
        'P1_2':['3', '7', '7','9', '8', '10', '8', '9', '3', '10', '9', '5', '3', '8', '9', '6', '7', '5'],
       'P2_1': ['1', '2', '3','4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18'],
         'P2_2': ['3', '7', '7','9', '8', '10', '8', '9', '8', '10', '12', '13', '14', '8', '17', '8', '2', '5']}
df=pd.DataFrame(data)

This is the df that I have.

enter image description here

What I want is now to reorder the columns. P1 and P2 are the names of a Category, the second number _1 and _2 are time points. Now I want the categories shown in rows and the time points in columns by receiving the values. It should look the following way:

enter image description here

In the 2nd example I added a 3rd P only for having more values.

I think there might be a familiar way. Can anyone give me a direction to think?

2

There are 2 answers

5
jezrael On

Use:

data={'Punkte_Teil1_1': ['1', '6', '5','8', '4', '7', '5', '7', '1', '7', '3', '2', '1', '4', '7', '5', '7', '1'],
      'Punkte_Teil1_2': ['3', '7', '7','9', '8', '10', '8', '9', '3', '10', '9', '5', '3', '8', '9', '6', '7', '5'],
      'Punkte_Teil2_1': ['1', '2', '3','4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18'],
      'Punkte_Teil2_2': ['3', '7', '7','9', '8', '10', '8', '9', '8', '10', '12', '13', '14', '8', '17', '8', '2', '5']}
df=pd.DataFrame(data)

print (df)
   Punkte_Teil1_1 Punkte_Teil1_2 Punkte_Teil2_1 Punkte_Teil2_2
0               1              3              1              3
1               6              7              2              7
2               5              7              3              7
3               8              9              4              9
4               4              8              5              8
5               7             10              6             10
6               5              8              7              8
7               7              9              8              9
8               1              3              9              8
9               7             10             10             10
10              3              9             11             12
11              2              5             12             13
12              1              3             13             14
13              4              8             14              8
14              7              9             15             17
15              5              6             16              8
16              7              7             17              2
17              1              5             18              5

Use DataFrame.pipe for set MultiIndex by last _ by str.rsplit, then set new column name by DataFrame.rename_axis, reshape by DataFrame.stack, use DataFrame.add_prefix and last convert second level of MultiIndex to column by DataFrame.reset_index, second is for create default RangeIndex:

df = (df.pipe(lambda x: x.set_axis(x.columns.str.rsplit('_', expand=True, n=1), axis=1))
        .rename_axis(['Cat',None], axis=1)
        .stack(0)
        .add_prefix('T')
        .reset_index(level=1)
        .reset_index(drop=True))
print (df.head())
            Cat T1 T2
0  Punkte_Teil1  1  3
1  Punkte_Teil2  1  3
2  Punkte_Teil1  6  7
3  Punkte_Teil2  2  7
4  Punkte_Teil1  5  7
0
jakelime On

I have no idea what your data meant and why or how you want it to become that way, but I can share with you some techniques using pandas.

I hope that this can perhaps kick you off in direction you need.

import pandas as pd

data = ...

def concat_args(*args):
    return ";".join(*args)

df = pd.DataFrame(data)
df = df.melt()
df["category"] = df["variable"].apply(lambda x: x.split("_")[0])
df["timepoint"] = df["variable"].apply(lambda x: f"T{x.split('_')[1]}")
df = pd.pivot_table(
    df, index="category", columns="timepoint", values="value", aggfunc=concat_args
)
print(df)

Expected results:

timepoint                                            T1                                         T2
category                                                                                          
P1                  1;6;5;8;4;7;5;7;1;7;3;2;1;4;7;5;7;1      3;7;7;9;8;10;8;9;3;10;9;5;3;8;9;6;7;5
P2         1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18  3;7;7;9;8;10;8;9;8;10;12;13;14;8;17;8;2;5