Scalable approach to make values in a list as column values in a dataframe in pandas in Python

78 views Asked by At

I have a pandas dataframe which has only one column, the value of each cell in the column is a list/array of numbers, this list is of length 100 and this length is consistent across all the cell values.

We need to convert each list value as a column value, in other words have a dataframe which has 100 columns and each column value is at a list/array item.

Something like this enter image description here

becomes enter image description here

It can be done with iterrows() as shown below, but we have around 1.5 million rows and need a scalable solution as iterrows() would take alot of time.

cols = [f'col_{i}' for i in range(0, 4)]
df_inter = pd.DataFrame(columns = cols)
for index, row in df.iterrows():
    df_inter.loc[len(df_inter)] = row['message']
2

There are 2 answers

0
Mayank Porwal On BEST ANSWER

You can do this:

In [28]: df = pd.DataFrame({'message':[[1,2,3,4,5], [3,4,5,6,7]]})

In [29]: df
Out[29]: 
           message
0  [1, 2, 3, 4, 5]
1  [3, 4, 5, 6, 7]

In [30]: res = pd.DataFrame(df.message.tolist(), index= df.index)

In [31]: res
Out[31]: 
   0  1  2  3  4
0  1  2  3  4  5
1  3  4  5  6  7
4
Brian Larsen On

I think this would work:

df.message.apply(pd.Series)

To use dask to scale (assuming it is installed):

import dask.dataframe as dd

ddf = dd.from_pandas(df, npartitions=8)
ddf.message.apply(pd.Series, meta={0: 'object'})