Search for corresponding node in a regression tree using rpart

Question

Search for corresponding node in a regression tree using rpart

4.4k views Asked by antoine At 24 February 2011 at 09:33

I'm pretty new to R and I'm stuck with a pretty dumb problem.

I'm calibrating a regression tree using the rpart package in order to do some classification and some forecasting.

Thanks to R the calibration part is easy to do and easy to control.

#the package rpart is needed
library(rpart)

# Loading of a big data file used for calibration
my_data <- read.csv("my_file.csv", sep=",", header=TRUE)

# Regression tree calibration
tree <- rpart(Ratio ~ Attribute1 + Attribute2 + Attribute3 + 
                      Attribute4 + Attribute5, 
                      method="anova", data=my_data, 
                      control=rpart.control(minsplit=100, cp=0.0001))

After having calibrated a big decision tree, I want, for a given data sample to find the corresponding cluster of some new data (and thus the forecasted value).
The predict function seems to be perfect for the need.

# read validation data
validationData <-read.csv("my_sample.csv", sep=",", header=TRUE)

# search for the probability in the tree
predict <- predict(tree, newdata=validationData, class="prob")

# dump them in a file
write.table(predict, file="dump.txt")

However with the predict method I just get the forecasted ratio of my new elements, and I can't find a way get the decision tree leaf where my new elements belong.

I think it should be pretty easy to get since the predict method must have found that leaf in order to return the ratio.

There are several parameters that can be given to the predict method through the class= argument, but for a regression tree all seem to return the same thing (the value of the target attribute of the decision tree)

Does anyone know how to get the corresponding node in the decision tree?

By analyzing the node with the path.rpart method, it would help me understanding the results.

Original Q&A

There are 4 answers

**Benjamin** · Answer 1 · 2011-03-10T18:21:31+00:00

I think what you want is type="vector" instead of class="prob" (I don't think class is an accepted parameter of the predict method), as explained in the rpart docs:

If type="vector": vector of predicted responses. For regression trees this is the mean response at the node, for Poisson trees it is the estimated response rate, and for classification trees it is the predicted class (as a number).

**yuji** · Answer 2 · 2011-06-21T19:15:06+00:00

Benjamin's answer unfortunately doesn't work: type="vector" still returns the predicted values.

My solution is pretty klugy, but I don't think there's a better way. The trick is to replace the predicted y values in the model frame with the corresponding node numbers.

tree2 = tree
tree2$frame$yval = as.numeric(rownames(tree2$frame))
predict = predict(tree2, newdata=validationData)

Now the output of predict will be node numbers as opposed to predicted y values.

(One note: the above worked in my case where tree was a regression tree, not a classification tree. In the case of a classification tree, you probably need to omit as.numeric or replace it with as.factor.)

**Heidi** · Answer 3 · 2016-06-06T12:35:35+00:00

You can use the partykit package:

fit <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)

library("partykit")
fit.party <- as.party(fit)
predict(fit.party, newdata = kyphosis[1:4, ], type = "node")

For your example just set

predict(as.party(tree), newdata = validationData, type = "node")

**DianaS** · Answer 4 · 2022-03-10T10:09:24+00:00

DianaS On 10 March 2022 at 10:09

treeClust::rpart.predict.leaves(tree, validationData) returns node number
also tree$where returns node numbers for the training set

TechQA.

Search for corresponding node in a regression tree using rpart

There are 4 answers

Related Questions in R

Related Questions in REGRESSION

Related Questions in DECISION-TREE

Related Questions in RPART

Related Questions in CART-ANALYSIS

Popular Questions

Trending Questions