I'm studying k-anonymization and the mondrian algorithm proposed by LeFevre. In it, LeFevre says that at one point in his algorithm, we have to choose a feature in the Dataframe depending on which feature has the largest range of normalized values.
For example, if I have the feature Age
in my dataset with the values:
[13, 15, 24, 30]
, I understand that the range is 13-30
, but as soon as you make it normalized wouldn't it always be [0-1]
?
I know that the question seems strange, but I couldn't find anything on the internet nor on the paper itself that documented more what he meant.
It depends on a normalization technique but yes. If we use min max it will always be between
[0,1]
. What you can do is split that variable into segments and the normalized your data. However you use minx-max normalization, the minimum value of that feature gets transformed into a0
, and the maximum value gets a1
. Maybe a mean normalization could give you a different result in that case.