I want to have something similar to pos_explode in pandas, i.e. keep the index of the element in the original array.

df = pd.DataFrame({'metric': {24: 53, 68: 93, 86: 38},
 'label': {24: 1, 68: 1, 86: 1},
 'group_1': {24: 1, 68: 1, 86: 1},
 'group_2': {24: 1, 68: 1, 86: 1},
 'metric_group_0': {24: np.array([72, 41, 96]),
  68: np.array([85, 56, 33]),
  86: np.array([26, 85, 26])}})
df = df.reset_index(drop=True)
df = df.reset_index(drop=False)
df = df.set_index(['index'])
display(df)
s=pd.DataFrame({'metric_group_0':np.concatenate(df.metric_group_0.values)},index=df.index.repeat(df.metric_group_0.str.len()))
display(s)
s.join(df.drop('metric_group_0',1),how='left')

This explodes the data but is losing the index. How can I keep the index as an additional column? I.e. in this example it would be [1,2,3] for each pandas.Index.

       metric  label  group_1  group_2 metric_group_0
index                                                
0          53      1        1        1   [72, 41, 96]
1          93      1        1        1   [85, 56, 33]
2          38      1        1        1   [26, 85, 26]

is currently converted to:

       metric_group_0  metric  label  group_1  group_2
index                                                 
0                  72      53      1        1        1
0                  41      53      1        1        1
0                  96      53      1        1        1
1                  85      93      1        1        1
1                  56      93      1        1        1
1                  33      93      1        1        1
2                  26      38      1        1        1
2                  85      38      1        1        1
2                  26      38      1        1        1

but is missing the original index. The desired output would look like:

       metric_group_0  metric  label  group_1  group_2 pos_in_array
index                                                 
0                  72      53      1        1        1  1
0                  41      53      1        1        1  2
0                  96      53      1        1        1  3
1                  85      93      1        1        1  1
1                  56      93      1        1        1  2
1                  33      93      1        1        1  3
2                  26      38      1        1        1  1
2                  85      38      1        1        1  2
2                  26      38      1        1        1  3

1 Answers

1
Erfan On Best Solutions

You can create this column with groupby.cumcount and we use the index as groups:

df['pos_in_array'] = df.groupby(df.index).cumcount()+1

print(df)
       metric_group_0  metric  label  group_1  group_2  pos_in_array
index                                                               
0                  72      53      1        1        1             1
0                  41      53      1        1        1             2
0                  96      53      1        1        1             3
1                  85      93      1        1        1             1
1                  56      93      1        1        1             2
1                  33      93      1        1        1             3
2                  26      38      1        1        1             1
2                  85      38      1        1        1             2
2                  26      38      1        1        1             3

So your whole code would look like the following, since you didn't assign your newly created dataframe into a variable yet:

df = df.reset_index(drop=True)
df = df.reset_index(drop=False)
df = df.set_index(['index'])

s=pd.DataFrame({'metric_group_0':np.concatenate(df.metric_group_0.values)},
               index=df.index.repeat(df.metric_group_0.str.len()))

df = s.join(df.drop('metric_group_0',1),how='left')

df['pos_in_array'] = df.groupby(df.index).cumcount()+1