I am training a random forest model in the randomForest package for my data. Some variables are in the class of character. I am pretty sure that randomForest will only take factor and numeric classes as input. So I think R automatically coerces the character into numeric.
In order for me to know how this may affect my modelling result, does anyone know how R automatically coerces the character into numeric class (like an algorithm/rule)? Or any source code I can look at?
I am using R version 4.0.1.
Thanks in advance.
An update: I checked using
getTree(mod,1,labelVar=TRUE)
And I can see that if those character variables are converted to factors, then the "split point" in the output is an integer (which means it is a categorical variable (see: https://www.rdocumentation.org/packages/randomForest/versions/4.6-14/topics/getTree)). But if not converted to factors, then the "split point" in the output is not integer.
So I guess is that R coerces the values of those character variables into numeric values? But how?
Not sure right now regarding the random forests in R, but I am kind of convinced, that it only takes
factors. If it does takecharacters as well, it will convert them to factor, not to numeric.And there is no clear conversion from character to numeric in R.