Building Rpart Data Tree Error Invalid Type(list)

155 views Asked by At

I am new to coding and I am trying to build a data tree, but I keep encountering the same error:

Error in model.frame.default(formula = df ~ df$Open.Closed + df$Region, : invalid type (list) for variable 'df'

I have looked throughout the site and haven't been able to find a valid solution to my problem. I have tried multiple solutions, but I usually end up with another error that says data is a matrix, which the part won't accept. Any help would be much appreciated.

This is my code:

library(rpart.plot)
library(ggExtra)
library(gridExtra)
library(RGtk2)
library(rpart)
library(rattle)
df[] <- data.frame(lapply(Test_Bank_Model,factor))
df [col_names] <- lapply(df[col_names], factor)

str(df)
summary(df)
print(df)


tree <- rpart(df ~ df$Open.Closed + df$Region, data = df, method = "class",
          model = TRUE, control = rpart.control("minsplit" = 1))
rpart.plot(tree, roundint = FALSE, box.palette = "white")
Data:
Region
Closing.Date
Annual.Average.FedFunds
Open.Closed
1   South   2020    0.2328571   Closed
2   Mid West    2020    0.2328571   Closed
3   North East  2020    0.2328571   Open
4   South   2020    0.2328571   Open
5   North East  2020    0.2328571   Open
6   West    2020    0.2328571   Open
7   North East  2020    0.2328571   Open
8   North East  2019    1.7366667   Closed
9   South   2019    1.7366667   Closed
10  Mid West    2019    1.7366667   Closed
1

There are 1 answers

0
Mako On

From the error message I take that you are using a list object while you need a data frame.

lapply returns results as lists. I assume that is where the format changes unnoticed.

I made a data frame called 'Test_Bank_Model', got the column names and excluded the 'Annual.Average.FedFunds' from converting to a factor (I'm not sure what you want to do with the years).

In rpart you can specify the data.frame via the data argument, as you did. When you do, you can save yourself retyping the data frames name (but I'm not aware that this is problematic; it should work too).

Test_Bank_Model <- data.frame(Region = c("South","Mid West","North East","South","North East","West","North East","North East","South"),
    Closing.Date = c(rep(2020,7), 2019,2019),
    Annual.Average.FedFunds = c(0.2328571,0.2328571,0.2328571,0.2328571,0.2328571,0.2328571,0.2328571,1.7366667,1.7366667),
    Open.Closed = c("Closed","Closed","Open","Open","Open","Open","Open","Closed","Closed"))

col_names <- colnames(Test_Bank_Model)[-3]

Test_Bank_Model[,col_names] <- as.data.frame(lapply(Test_Bank_Model[,col_names], FUN=as.factor))

str(Test_Bank_Model)
# 'data.frame': 9 obs. of  4 variables:
#  $ Region                 : Factor w/ 4 levels "Mid West","North East",..: 3 1 2 3 2 4 2 2 3
#  $ Closing.Date           : Factor w/ 2 levels "2019","2020": 2 2 2 2 2 2 2 1 1
#  $ Annual.Average.FedFunds: num  0.233 0.233 0.233 0.233 0.233 ...
#  $ Open.Closed            : Factor w/ 2 levels "Closed","Open": 1 1 2 2 2 2 2 1 1

tree <- rpart(Annual.Average.FedFunds ~ Open.Closed + Region,
    data = Test_Bank_Model,
    method = "class",
    model = TRUE,
    control = rpart.control("minsplit" = 1))
rpart.plot(tree, roundint = FALSE, box.palette = "white")

enter image description here