I have a set of data from a drill hole, it contains information about different geomechanical properties every 2 meters. I am trying to create geomechanical domains, and assign each point to a different domain.
I am trying to use random forest classification, and am unsure how to relate the proximty matrix (or any result from the randomForest function) to labels.
My humble code so far is as follows:
dh <- read.csv("gt_1_classification.csv", header = T)
#replace all N/A with 0
dh[is.na(dh)] <- 0
library(randomForest)
dh_rf <- randomForest(dh, importance=TRUE, proximity=FALSE, ntree=500, type=unsupervised, forest=NULL)
I would like the classifier to decide the domains on its own.
Any help would be great!
Hack-R is correct -- first it is necessary to explore the data using some clustering (unsupervised learning) methods. I've provided some sample code using the R built-in mtcars data as a demonstration:
After running on your own data, consider which definition of clusters captures the level of similarity of interest to you. You can then create a new variable with a "level" for each cluster and then create a supervised model to that.
Here's a decision tree example using the same mtcars data. NOTE that here I used mpg as the response -- you would want to use your new variable based on the clusters.
Note that the although very informative, a basic decision tree is often not great for prediction. If prediction is desirable, other models should also be explored.