Standardizing or Normalizing discrete variable?

2.5k views Asked by At

When we have discrete variable such as age, number of sick leaves, number of kids in the family and number of absences within a dataframe which i wanted to make a prediction model with binary result, is it okay to include these variables along with other numeric continuous variables into a standardization or normalization process?

or should i categorize these discrete variables into a categoric variable and turned them into dummy variables?

2

There are 2 answers

0
Sahil_Angra On

If they are not one of the target variables, It is okay to include these variables along with other numeric continuous variables into a standardization or normalization process.

0
P RAY On

Agree to Sahil_Angra. But some points need to be added for clarity. I think what is needed here is to refer back to the idea of scales in statistics, and there are 4.
here is a resource
https://studyonline.unsw.edu.au/blog/types-of-data
if you refer to this, you will see that there are two types to begin with quantitative and qualitative.
For qualitative data, you can not form a comparison scale. For example, no point doing a ratio of male and female gender. Now, you can do a ratio of count of male and female members of a group, but you cannot do that at individual level. These data items define a category and we call them categorical. You can do dummy variable generation etc. on these items, to circumvent the situation where some algorithms can not handle these directly.
Now coming back to your examples like age, number of children etc. they are all numerically comparable, like less , more and ratios. So they are quantitative, and hence the conclusion should be what Sahil_Angra said above.
But I shall add, if these are target of a regression problem, no point doing these as categorical, but if you are bucketing it somehow, and trying to classify, then depending on how you formulate the problem, you may need to do dummy on this.