R pairwise PCA function coverts X is nonnumeric object

Question

R pairwise PCA function coverts X is nonnumeric object

132 views Asked by benalbert342 At 06 January 2025 at 02:40

I am writing a function that performs PCA on pairs of variables in an xts object until the correlation between all of the variables is less than 0.1. Here is the function that I wrote:


PCA_Selection <- function(X, r=0.1){

  M <- cor(X) # Creating corrolation matrix 
  M[M==1] <- 0 # filling the diagnal with 0s so that pairs of the same variables are not considered 
  while(max(abs(M)) > r){
    M <- cor(X)
    PCA_vars <- matrix(,nrow = (nrow(M))^2 ,ncol = 2)
    for(i in 1:ncol(M)){ # Selects variables that will be use for PCA
      for(j in 1:nrow(M)){
        if(M[j,i] > r & M[j,i] < 1){
          PCA_vars[c(i*j),] <- c(row.names(M)[i],colnames(M)[j])
        }}} # works 
    PCA_vars <- na.omit(PCA_vars) # works 
    for (i in 1:nrow(PCA_vars)) {
      PCA_pre <- prcomp(X[,c(names(X) %in% PCA_vars[i,])]) 
      Sum_PCA <- summary(PCA_pre)
      tmp <- data.frame()
      if (Sum_PCA[["importance"]][2,1] > 0.95){ # if the first component captures 95% of variance
        tmp <- data.frame(predict(PCA_pre, X)[,1]) # then only use the first component for predictions 
        names(tmp) <- c(paste0("Com_",PCA_vars[i,1],"_",PCA_vars[i,2],"_1"))
      }else { # else use all both of the component and do not reduce the dimensions 
        tmp <- predict(PCA_pre,X)
        colnames(tmp) <- c(paste0("Com_",PCA_vars[i,1],"_",PCA_vars[i,2],"_1"), 
                        paste0("Com_",PCA_vars[i,1],"_",PCA_vars[i,2],"_2"))
      }
      Xnew <- cbind(X,tmp)
      X <- Xnew
    }

    PCA_vars <- unique(as.vector(PCA_vars)) # Variables to be removed 
    X <- X[, -which(colnames(X) %in% PCA_vars)]

    M <- cor(X)
    M[M==1] <- 0
  }  
    return(Xnew)
}

However, when I run the function r returns a strange error:

Error in colMeans(x, na.rm = TRUE): 'x' must be numeric

The data that I am testing the function with is an xts object that does not have any missing observations. Furthermore, all of the variables have non-zero variance and there are only continuous numeric variables in the data.

Original Q&A

There are 1 answers

**Edward** · Answer 1 · 2020-03-22T07:44:43+00:00

The error occurs at line 15: PCA_pre <- prcomp(X[,c(names(X) %in% PCA_vars[i,])])

Actually, this works on the first run, when i=1. But it fails on the second run when i=2 for the following reason.

On line 27 you modify the X by assigning it to Xnew:

27: X <- Xnew

which is created on line 26:

26: `Xnew <- cbind(X,tmp)

which I can't quite get my head around. Anyway, tmp is assigned on line 19 (if the principal component captures > 0.95 of the total variance) or on line 22 (if it doesn't).

19: tmp <- data.frame(predict(PCA_pre, X)[,1])
22: tmp <- predict(PCA_pre,X)

This also befuddles me because on line 19 tmp will have a "data.frame" class while on line 22 it will have class "matrix". This is important later when you create the Xnew object on line 26 (see above). If tmp is a data frame, then Xnew will be a "matrix", which has no names attribute:

names(X)
NULL

And this is why you get an error on line 15 (see above); the prcomp function is attempting to run a PCA on an empty set.

I think the solution may be to not use the data.frame() function on line 19.

19: tmp <- predict(PCA_pre, X)[,1]

I tested this on a sample "xts" dataset but it seems to run forever. But at least there is no error.

And as an aside, line 17 could be omitted as it doesn't seem to do anything.

17: tmp <- data.frame()

TechQA.

R pairwise PCA function coverts X is nonnumeric object

There are 1 answers

Related Questions in R

Related Questions in FUNCTION

Related Questions in PCA

Related Questions in PRCOMP

Popular Questions

Popular Tags

Trending Questions