R-script debugging for lda and predict

40 views Asked by At

For the question; perform a linear discriminant analysis (LDA) on the training data to construct a classification rule (discriminant function) for tree type based on all available continuous measurement variables. Use cross-validation (CV=TRUE), and calculate the misclassification rate (MCR) for this model using a contingency table (i.e., table function). Summarize linear discriminant functions.

The started script i was given is;

# A4: Classification based on Roosevelt Forest Trees dataset
library(MASS)
### change the following line to point to your CSV file:
filename<-"trees_sample.csv"
# read the data and pre-process, set TreeNumber as row name
trees=read.csv(filename,row.names = 1)

# or
rownames(trees) <- trees$TreeNumber
# trees[,1] = NULL

# check dimension
dim(trees)

# scale the data (numeric variables only)
trees[,1:9]=scale(trees[,1:9])

#
# 1. Divide data into training (80%) and test (20%) by doing random sample without replacement
set.seed(10101)
# Now Selecting 80% of data as sample from total 'n' rows of the data  
sample <- sample.int(n = nrow(trees), size = floor(.80*nrow(trees)), replace = F)
trees_train <- trees[sample, ]
trees_test  <- trees[-sample, ]# these are the training set subscripts

# 2. Build LDA model on scaled training data
# first use all numeric predictors (i.e. not the factor Area)

# test accuracy via the missclassification rate (MCR)

# chi-sq test for overall significance of predicted classes

# use MANOVA to get Wilks test result:

# and summary.aov() to get individual contributions ?

# fit the model again without CV for prediction later 

# summarise model


# 3. Model specification and testing:
#
# determine which LD components are important using barplot

#
# Prediction of test data:
# apply full model to test data and get MCR:



# Clustering: find out how many distinct tree types we really have...
# 
# tree diagram (work on a random sample of n=1000 to speed things up):
sam=sample(seq(1,80000,1),size=1000)
hc = hclust(dist(trees_train[sam,1:10]))
hcd=as.dendrogram(hc)
plot(hcd)
# very simple dendrogram, cut at h=10
plot(cut(hcd, h = 10)$upper, main = "Upper tree of cut at h=10")
 
# use EH Ch 9 method for determining how many clusters based on iterative within groups sum of squares

 
#
# k-means fit with k = ?
 
# Centroid Plot against 1st 2 discriminant functions (explain 95%+ variations)
library(fpc) 
plotcluster(....)

And the code additions I made are;

lda_model <- lda(Type ~ ., data = trees_train, CV = TRUE)

# Test accuracy via the misclassification rate (MCR)
lda_train_predicted <- predict(lda_model)$class
conf_matrix_train <- table(Actual = trees_train$Type, Predicted = lda_train_predicted)
mcr_train <- 1 - sum(diag(conf_matrix_train)) / sum(conf_matrix_train)

In the question it is given to do CV = True; however with doing that I get return value of a list in lda_model. And in predict function after that i have to use an lda object but using lda_model in there gives me following error;

Error in UseMethod("predict") : 
  no applicable method for 'predict' applied to an object of class "list"

Help solve this.

0

There are 0 answers