I have dataframe whose some columns (C1 ,C2 ,C3) are categorical ( string) variable. Data and datatypes is as following:
C1 C2 C3 C4 C5 \
4 b'02e197c5' b'c2ced437' b'a2427619' b'3f85ecae' b'b8c51ab7'
9 b'62770d79' b'ad984203' b'ddd956c1' b'f7f54f97' b'bbaea1c0'
13 b'7ffd46c3' b'710103fd' b'a1407382' b'f2463ffb' b'664ff944'
14 b'9a8cb066' b'7a06385f' b'417e6103' b'6faef306' b'f8990a45'
45 b'6f877ce8' b'58cc2d25' b'9b48ba97' b'f2463ffb' b'd90dd51f'
Datatype:
C1 object
C2 object
C3 object
Then , I have used DictVectorizer to apply one-hot-code for string
labelTransformer = DictVectorizer(dtype='str')
labelTransformer.fit_transform(clickDataFrame["C1"].astype("str"))
But after that , I get error as following:
File "click_main.py", line 60, in <module>
df2 = labelTransformer.fit_transform(clickDataFrame["C1"].astype("str"))
File "/usr/local/lib/python3.6/dist-packages/sklearn/feature_extraction/dict_vectorizer.py", line 230, in fit_transform
return self._transform(X, fitting=True)
File "/usr/local/lib/python3.6/dist-packages/sklearn/feature_extraction/dict_vectorizer.py", line 166, in _transform
for f, v in six.iteritems(x):
File "/usr/local/lib/python3.6/dist-packages/sklearn/externals/six.py", line 439, in iteritems
return iter(getattr(d, _iteritems)(**kw))
AttributeError: 'str' object has no attribute 'items'
I have tried alot of , but I can't find a solution ?
You can get one-hot encodings directly from pandas using
pd.get_dummies. If you want to treat each column independently, you can simply dopd.get_dummies(df)orpd.get_dummies(df.C1).If you want to obtain an indicator for every unique value across all columns, you can use
pd.get_dummies(df.stack()).unstack().swaplevel(0, 1, axis=1).