Does XGBoost need standardization or normalization?

4.2k views Asked by At

In the link below, I confirmed that normalization is not required in XGBoost. However, in the dataset we are using now, we need to use standardization to get high performance.

Is standardization necessary if the scales between features constituting one sample vary greatly?

Here is one sample of 18 features. The tenth feature is always equal to 60.

[ 0.001652 0.000434 0.00312 0.000494 -0.093933 4.173985 6.314583 1.138626 3.807321 60. 0.000108 0.000131 0.000272 0.000067 0.000013 0.000013 0.000108 0.000108]

https://datascience.stackexchange.com/a/60954


Your rationale is indeed correct: decision trees do not require normalization of their inputs;
 and since XGBoost is essentially an ensemble algorithm comprised of decision trees, it does not
 require normalization for the inputs either.

For corroboration, see also the thread Is Normalization necessary? at the XGBoost Github repo, 
where the answer by the lead XGBoost developer is a clear:

no you do not have to normalize the features
1

There are 1 answers

0
Venkatesh Kuppusamy On

Standardization comes into the picture when features of the input data set have large differences between their ranges, or simply when they are measured in different units (e.g., pounds, meters, miles, etc.).

Logistic regressions and tree-based algorithms such as decision trees, random forests and gradient boosting are not sensitive to the magnitude of variables. So standardization is not needed before fitting these kinds of models.

Reference: https://builtin.com/data-science/when-and-why-standardize-your-data