Restructuring 2-level data to 3-level data

35 views Asked by At

I have a two-level multilevel data. Below is an example of it. I have two participants that reported their fruits consumption for two days.

id day apple orange
1   1    34     12
1   2    54     43
2   1    65     56
2   2    87     81

I want to restructure the dataframe from 2-level data to 3-level data. I want to a) make a variable for type of fruit; b) nest fruit consumption in days in one column. So the dataframe looks like below:

id day fruit_type fruit_consumption
1   1      apple                34
1   1     orange                12
1   2      apple                54
1   2     orange                43
2   1      apple                65
2   1     orange                56
2   2      apple                87
2   2     orange                81
1

There are 1 answers

1
Arty On

If you're are referring to Python's pandas DataFrame then I've coded for it next solution, a bit generic, not only for two columns apple/orange that need to be rearranged but for any number of columns, controlled by specifying range crng in the beginning of script. Should work fast as it uses plain numpy with small number of python's loop iterations.

You can also try next code online here!.

import pandas as pd, numpy as np

crng = (2, 4) # Range of columns for rearranging

df = pd.DataFrame([map(int, line.split()) for line in """
    1   1    34     12
    1   2    54     43
    2   1    65     56
    2   2    87     81
""".splitlines() if line.strip()], columns = ['id', 'day', 'apple', 'orange'])
print('input:\n', df)

a = df.values
b = np.zeros((a.shape[0], crng[1] - crng[0], crng[0] + 2), dtype = np.object_)

for icol, col in enumerate(df.columns[crng[0] : crng[1]]):
    b[:, icol, 0] = a[:, 0]
    b[:, icol, 1] = a[:, 1]
    b[:, icol, 2] = df.columns[crng[0] + icol]
    b[:, icol, 3] = a[:, crng[0] + icol]

b = b.reshape((b.shape[0] * b.shape[1], b.shape[2]))

df = pd.DataFrame(data = b,
    columns = df.columns[:crng[0]].values.tolist() + ['fruit_type', 'fruit_consumption'],
)
print('output:\n', df)