I have a dataset with labels from 0-4. I one-hot encode them into np arrays using following code:labels = (np.arange(5) == labels[:, None]).astype(np.float32)
. My question is, do we have to one hot encode them? Can I just keep the labels numeric/float from 0-4 and use them? If so how?
Is one hot encoding required in Keras?
2.8k views Asked by Digvijay Sawant At
2
There are 2 answers
1
On
One hot encoding gives you ability to estimate probability of each class. Network will never give you same for single real or integer number class encoding. If you have similar classes 1 and 5 and network may confuse between them, it will give you mean result, and it will be some number between 1 and 5, and it may be 2,3 or 4 and anything in between, but these numbers encodes your classes, so it will be completely wrong output.
What are your labels?
Are they "levels", "intensity" or some "measure"??
If yes, it's better not to encode them, just compress them between 0 and 1. Your model will then be able to calculate one result, which will be the intensity. It will never be exact though.
If not, then you should encode your labels, because each number is not related to the other, they are "discrete", and it's better for your model to show also discrete results. Then each result will be the likelyhood (maybe not exactly probability depending on your model's parameters) of the result being each class.
So, in short: if you want to measure the intensity of something, one var. If you want the likelyhood of different "classes", create the one-hot vector.