Pandas Series of lists to one series

Question

Pandas Series of lists to one series

55.2k views Asked by Max At 17 June 2015 at 07:29

I have a Pandas Series of lists of strings:

0                           [slim, waist, man]
1                                [slim, waistline]
2                                     [santa]

As you can see, the lists vary by length. I want an efficient way to collapse this into one series

0 slim
1 waist
2 man
3 slim
4 waistline
5 santa

I know I can break up the lists using

series_name.split(' ')

But I am having a hard time putting those strings back into one list.

Thanks!

Original Q&A

There are 10 answers

Anand S Kumar On 17 June 2015 at 07:47

You can use the list concatenation operator like below -

lst1 = ['hello','world']
lst2 = ['bye','world']
newlst = lst1 + lst2
print(newlst)
>> ['hello','world','bye','world']

Or you can use list.extend() function as below -

lst1 = ['hello','world']
lst2 = ['bye','world']
lst1.extend(lst2)
print(lst1)
>> ['hello', 'world', 'bye', 'world']

Benefits of using extend function is that it can work on multiple types, where as concatenation operator will only work if both LHS and RHS are lists.

Other examples of extend function -

lst1.extend(('Bye','Bye'))
>> ['hello', 'world', 'Bye', 'Bye']

peterfields On 17 June 2015 at 07:55

You can try using itertools.chain to simply flatten the lists:

In [70]: from itertools import chain
In [71]: import pandas as pnd
In [72]: s = pnd.Series([['slim', 'waist', 'man'], ['slim', 'waistline'], ['santa']])
In [73]: s
Out[73]: 
0    [slim, waist, man]
1     [slim, waistline]
2               [santa]
dtype: object
In [74]: new_s = pnd.Series(list(chain(*s.values)))
In [75]: new_s
Out[75]: 
0         slim
1        waist
2          man
3         slim
4    waistline
5        santa
dtype: object

tegancp On 17 June 2015 at 07:57

You are basically just trying to flatten a nested list here.

You should just be able to iterate over the elements of the series:

slist =[]
for x in series:
    slist.extend(x)

or a slicker (but harder to understand) list comprehension:

slist = [st for row in s for st in row]

Adarsh Namdev On 19 October 2019 at 07:09

You may also try:

combined = []
for i in s.index:
    combined = combined + s.iloc[i]

print(combined)

s = pd.Series(combined)
print(s)

output:

['slim', 'waist', 'man', 'slim', 'waistline', 'santa']

0         slim
1        waist
2          man
3         slim
4    waistline
5        santa

dtype: object

Roman Kotov On 24 August 2019 at 08:50

In pandas version 0.25.0 appeared a new method 'explode' for series and dataframes. Older versions do not have such method.

It helps to build the result you need.

For example you have such series:

import pandas as pd

s = pd.Series([
    ['slim', 'waist', 'man'],
    ['slim', 'waistline'],
    ['santa']])

Then you can use

s.explode()

To get such result:

0         slim
0        waist
0          man
1         slim
1    waistline
2        santa

In case of dataframe:

df = pd.DataFrame({
  's': pd.Series([
    ['slim', 'waist', 'man'],
    ['slim', 'waistline'],
    ['santa']
   ]),
   'a': 1
})

You will have such DataFrame:

                    s  a
0  [slim, waist, man]  1
1   [slim, waistline]  1
2             [santa]  1

Applying explode on s column:

df.explode('s')

Will give you such result:

           s  a
0       slim  1
0      waist  1
0        man  1
1       slim  1
1  waistline  1
2      santa  1

If your series, contain empty lists

import pandas as pd

s = pd.Series([
    ['slim', 'waist', 'man'],
    ['slim', 'waistline'],
    ['santa'],
    []
])

Then running explode will introduce NaN values for empty lists, like this:

0         slim
0        waist
0          man
1         slim
1    waistline
2        santa
3          NaN

If this is not desired, you can dropna method call:

s.explode().dropna()

To get this result:

0         slim
0        waist
0          man
1         slim
1    waistline
2        santa

Dataframes also have dropna method:

df = pd.DataFrame({
  's': pd.Series([
    ['slim', 'waist', 'man'],
    ['slim', 'waistline'],
    ['santa'],
    []
   ]),
   'a': 1
})

Running explode without dropna:

df.explode('s')

Will result into:

           s  a
0       slim  1
0      waist  1
0        man  1
1       slim  1
1  waistline  1
2      santa  1
3        NaN  1

with dropna:

df.explode('s').dropna(subset=['s'])

Result:

           s  a
0       slim  1
0      waist  1
0        man  1
1       slim  1
1  waistline  1
2      santa  1

DaveFar On 29 January 2023 at 15:07

The accepted answer (by @mcwitt) looks nicely pandas-ish, but is awfully slow, is extremely memory hungry if there are outliers in the size of lists, and buggy (see comments to that answer).

+1 for @Tadej Magajna for his answer, taking the sum() over the series. Since it is adding lists together, a more efficient way is using numpy's flatten() in case the series elements are nparrays:

series_name.values.flatten().

Tadej Magajna On 10 July 2018 at 15:01

series_name.sum()

does exactly what you need. Do make sure it's a series of lists otherwise your values will be concatenated (if string) or added (if int)

vozman On 04 February 2019 at 12:22

Flattening and unflattening can be done using this function

def flatten(df, col):
    col_flat = pd.DataFrame([[i, x] for i, y in df[col].apply(list).iteritems() for x in y], columns=['I', col])
    col_flat = col_flat.set_index('I')
    df = df.drop(col, 1)
    df = df.merge(col_flat, left_index=True, right_index=True)

    return df

Unflattening:

def unflatten(flat_df, col):
    flat_df.groupby(level=0).agg({**{c:'first' for c in flat_df.columns}, col: list})

After unflattening we get the same dataframe except column order:

(df.sort_index(axis=1) == unflatten(flatten(df)).sort_index(axis=1)).all().all()
>> True

EliadL On 19 March 2020 at 16:19

If your pandas version is too old to use series_name.explode(), this should work too:

from itertools import chain

pd.Series(
    chain.from_iterable(
        value
        for i, value
        in series_name.iteritems()
    )
)

**mcwitt** · Accepted Answer · 2017-01-11T18:44:57+00:00

Here's a simple method using only pandas functions:

import pandas as pd

s = pd.Series([
    ['slim', 'waist', 'man'],
    ['slim', 'waistline'],
    ['santa']])

Then

s.apply(pd.Series).stack().reset_index(drop=True)

gives the desired output. In some cases you might want to save the original index and add a second level to index the nested elements, e.g.

0  0         slim
   1        waist
   2          man
1  0         slim
   1    waistline
2  0        santa

If this is what you want, just omit .reset_index(drop=True) from the chain.

TechQA.

Pandas Series of lists to one series

There are 10 answers

Related Questions in PYTHON

Related Questions in STRING

Related Questions in LIST

Related Questions in PANDAS

Related Questions in SERIES

Popular Questions

Popular Tags

Trending Questions