I am reading pandas documentation to understand pandas.get_dummies
>>> import pandas as pd
>>> l = list('abca')
>>> print l
['a', 'b', 'c', 'a']
>>> s = pd.Series(l)
>>> print s
0 a
1 b
2 c
3 a
I have created a Series as shown above.
When I called get_dummies on this series, the output is as below
>>> pd.get_dummies(s)
a b c
0 1 0 0
1 0 1 0
2 0 0 1
3 1 0 0
What does it mean I could not understand.
Can we say the new values of the entries are as below?
a --> 100
b --> 010
c --> 001
a --> 100
Also, are they decimal or binary?
dummy
variables are features that are binary. Like a single column that says whether each row is or isn't some thing. When we have an existing column that has multiple values, more than 1. We can split those values into a single column for each unique value. Each new column is either one signifying that the row had that unique value, or it is zero signifying that the row did not have that unique value.Since each row of
s
had only one value, it stands to reason that each row of zeros and ones will only have one-one under the column header that was the value for the corresponding row ins
Put another way, think of the new
a
column as telling you where thea
s were ins
.