Does XGBoost need standardization or normalization?

Question

Does XGBoost need standardization or normalization?

4.2k views Asked by JeongJoo Lee At 18 April 2022 at 01:26

In the link below, I confirmed that normalization is not required in XGBoost. However, in the dataset we are using now, we need to use standardization to get high performance.

Is standardization necessary if the scales between features constituting one sample vary greatly?

Here is one sample of 18 features. The tenth feature is always equal to 60.

[ 0.001652 0.000434 0.00312 0.000494 -0.093933 4.173985 6.314583 1.138626 3.807321 60. 0.000108 0.000131 0.000272 0.000067 0.000013 0.000013 0.000108 0.000108]

https://datascience.stackexchange.com/a/60954


Your rationale is indeed correct: decision trees do not require normalization of their inputs;
 and since XGBoost is essentially an ensemble algorithm comprised of decision trees, it does not
 require normalization for the inputs either.

For corroboration, see also the thread Is Normalization necessary? at the XGBoost Github repo, 
where the answer by the lead XGBoost developer is a clear:

no you do not have to normalize the features

Original Q&A

There are 1 answers

**Venkatesh Kuppusamy** · Answer 1 · 2023-01-20T07:02:16+00:00

Standardization comes into the picture when features of the input data set have large differences between their ranges, or simply when they are measured in different units (e.g., pounds, meters, miles, etc.).

Logistic regressions and tree-based algorithms such as decision trees, random forests and gradient boosting are not sensitive to the magnitude of variables. So standardization is not needed before fitting these kinds of models.

Reference: https://builtin.com/data-science/when-and-why-standardize-your-data

TechQA.

Does XGBoost need standardization or normalization?

There are 1 answers

Related Questions in NORMALIZATION

Related Questions in XGBOOST

Related Questions in STANDARDIZATION

Popular Questions

Popular Tags

Trending Questions