I have a very random population I'm trying to split using binary decision tree.
Population probability
TRUE 51%
FALSE 49%
So the entropy is 1 (rounded to 3). So for any feature the entropy will also be 1 (the same), and thus no information gain.
Am I doing this right? In my process to learn it I haven't come across anything saying that entropy is useless for 2 classes
The entropy/information gain doesn't so much depend on the distribution of the classes, but on the information contained in the features that are used to characterise the instances in your data set. If, for example, you had a feature that was always 1 for the
TRUE
class and always 2 for theFALSE
class, it would have the highest information gain because it allows you to separate these two classes perfectly.If the information gain you're getting is very small, it indicates that the information contained in the features is not useful for separating your classes. In this case, you need to find more informative features.