I am playing with LGBM and indexed my categorical features using StingIndexer. but after that I haven't tell my model which features is categorical features. So, I am wondering how it knows which features are categorical features
Here is how I init my LGBM model.
val lgbm = new LightGBMClassifier("lgbm").
setObjective("binary").
setFeatureFraction(0.85).
setFeaturesCol("features").
setLabelCol("is_booker")
If you are using
mmlspark
(you didn't mention how you're using LightGBM in Scala), LightGBM automatically figures out which columns should be treated as categorical, based on the attributes of the columns.From Azure/mmlspark#559:
The method that accomplishes that is called
LightGBMUtils.getCategoricalIndexes()
, and you can find it at https://github.com/Azure/mmlspark/blob/95c1f8a782191e3578587a49313e1d57abee5da3/src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMUtils.scala#L74-L104.That method is re-used by
LightGBMBase.getCategoricalIndexes()
during training:If I'm right that you're using
mmlspark
and you have further questions about how this works, I recommend opening issues in Azure/mmlspark.