I have a dataset like:
e = pd.DataFrame({
'col1': ['A', 'A', 'B', 'W', 'F', 'C'],
'col2': [2, 1, 9, 8, 7, 4],
'col3': [0, 1, 9, 4, 2, 3],
'col4': ['a', 'B', 'c', 'D', 'e', 'F']
})
Here I encoded the data using sklearn.preprocessing.LabelEncoder
. By the following lines of code:
x = list(e.columns)
# Import label encoder
from sklearn import preprocessing
# label_encoder object knows how to understand word labels.
label_encoder = preprocessing.LabelEncoder()
for i in x:
# Encode labels in column 'species'.
e[i] = label_encoder.fit_transform(e[i])
print(e)
But this is encoding even the numeric datapoint of int
type, which is not required.
Encoded dataset :
col1 col2 col3 col4
0 0 1 0 3
1 0 0 1 0
2 1 5 5 4
3 4 4 4 1
4 3 3 2 5
5 2 2 3 2
How can I rectify this?
One really simple possibility would be to only encode columns with string values. E.g., tweaking your code to be:
or better yet: