How to index data by elements of lists in columns?

Question

How to index data by elements of lists in columns?

79 views Asked by Stanko Kovacevic At 21 December 2016 at 17:06

I have the following DataFrame (named df2 later):

    recipe_id                                        ingredients
0        3332         [11307, 11322, 11632, 11338, 11478, 11438]
1        3333  [11322, 11338, 11632, 11314, 11682, 11478, 108...
2        3334  [11632, 11682, 11338, 11337, 10837, 11435, 113...
3        3335  [11149, 11322, 11532, 11996, 10616, 10837, 113...
4        3336  [11330, 11632, 11422, 11256, 11338, 11314, 114...
5        3812                      [959, 92, 3, 554, 12271, 202]
...

I want to create another DataFrame which will have following colums : ['ingredients', "recipe_id1", "recipe_id2", ..., "recipe_idn"], where n is total number of recipes in database. I did that with the following snippet:

columns = ['ingredient'] + (list(df2['recipe_id'].unique()))
ingredient_df = pd.DataFrame(columns=columns)

After i create this DataFrame (which i did already), and populate it (problem i'm having), output should look like this:

In [1]:
# Create and populate ingredient_df by some method
columns = ['ingredient'] + (list(df2['recipe_id'].unique()))
ingredient_df = pd.DataFrame(columns=columns)
ingredient_df = populate_df(ingredient_df, df2)
Out [1]:
In [2]:  
ingredient_df
Out[2]:
   ingredient  ... 3332 3333 3334 3335 3336 ...
...
   11322       ...    1    1    0    1    0 ...
...

In the example above, value at (11322, 3334) is 0 because ingredient 11322 is not present in recipe with id 3334.

In other words, I want for every ingredient to have mapping (ingredient, recipe_id) = 1 if ingredient is present in that recipe, and 0 otherwise.

I've managed to do this by iterating over all recipes and trough all ingredient, but this is very slow. How can i do this in more robust and elegant way using Pandas methods (if this is possible at all)?

Original Q&A

There are 1 answers

**piRSquared** · Accepted Answer · 2016-12-21T17:30:31+00:00

setup

df = pd.DataFrame(
    dict(
        recipe_id=list('abcde'),
        ingredients=[list('xyz'),
                     list('tuv'),
                     list('ytw'),
                     list('vy'),
                     list('zxs')]
    )
)[['recipe_id', 'ingredients']]

df

  recipe_id ingredients
0         a   [x, y, z]
1         b   [t, u, v]
2         c   [y, t, w]
3         d      [v, y]
4         e   [z, x, s]

method 1

df.set_index('recipe_id').ingredients.apply(pd.value_counts) \
    .fillna(0).astype(int).T.rename_axis('ingredients')

recipe_id    a  b  c  d  e
ingredients               
s            0  0  0  0  1
t            0  1  1  0  0
u            0  1  0  0  0
v            0  1  0  1  0
w            0  0  1  0  0
x            1  0  0  0  1
y            1  0  1  1  0
z            1  0  0  0  1

method 2

idx = np.repeat(df.index.values, df.ingredients.str.len())
df1 = df.drop('ingredients', 1).loc[idx]
df1['ingredients'] = df.ingredients.sum()

df1.groupby('ingredients').recipe_id.apply(pd.value_counts) \
    .unstack(fill_value=0).rename_axis('recipe_id', 1)

recipe_id    a  b  c  d  e
ingredients               
s            0  0  0  0  1
t            0  1  1  0  0
u            0  1  0  0  0
v            0  1  0  1  0
w            0  0  1  0  0
x            1  0  0  0  1
y            1  0  1  1  0
z            1  0  0  0  1

TechQA.

How to index data by elements of lists in columns?

There are 1 answers

Related Questions in PYTHON

Related Questions in PYTHON-3.X

Related Questions in PANDAS

Related Questions in GROUP-BY

Related Questions in PANDAS-GROUPBY

Popular Questions

Popular Tags

Trending Questions