Let's say i have a specific column in my data frame. Some of the fields contain only 1 value, but some even 10. I decided to split the column values by ';' separator.
data['golden_globes_nominee_categories'].str.split(';')
After that I iterated by row like this:
for index, row in data.iterrows():
print (row['golden_globes_nominee_categories'])
And got this:
['Best Original Song - Motion Picture ', ' Best Performance by an Actor in a Motion Picture - Comedy or Musical']
['Best Original Score - Motion Picture ', ' Best Performance by an Actress in a Motion Picture - Drama']
...
Then I looped through each element like this:
for index, row in data.iterrows():
for x in row['golden_globes_nominee_categories']:
But now I am really interested in how to create columns for every specific value which will contain the numbers (1 or 0) that will show me if it was mentioned in a cell?
Essentially I want to do something like this:
dataframe["time_sp_comp2"] = dataframe["time_spend_company"].apply(lambda x: 1 if x==2 else 0)
dataframe["time_sp_comp3"] = dataframe["time_spend_company"].apply(lambda x: 1 if x==3 else 0)
dataframe["time_sp_comp4"] = dataframe["time_spend_company"].apply(lambda x: 1 if x==4 else 0)
dataframe.drop('time_spend_company', axis=1, inplace=True)
I think this is what you're after.
Example data
Split tag strings, get dummy variables (you could do this with
pd.get_dummies
)Merge names and dummy variables