Running speedlm on weighted data with missing values

166 views Asked by At

I am trying to run a linear regression on weighted data.
When using speedlm i get an error msg when there are missing values in the data.

 library(speedglm)
 sampleData <- data.frame(w = round(runif(12,0,1)),
                          target = rnorm(12,100,50),
                          predictor = c(NA, rnorm(10, 40, 10),NA))

 summary(sampleData)
       w              target          predictor    
 Min.   :0.0000   Min.   : -3.381   Min.   :22.58  
 1st Qu.:0.0000   1st Qu.: 48.321   1st Qu.:30.45  
 Median :1.0000   Median : 84.156   Median :37.09  
 Mean   :0.5833   Mean   : 92.306   Mean   :35.03  
 3rd Qu.:1.0000   3rd Qu.:119.891   3rd Qu.:41.96  
 Max.   :1.0000   Max.   :223.896   Max.   :43.48  
                                    NA's   :2
 #run linear regression without weights
 linearNoWeights <- lm(formula("target~predictor"), data = sampleData)
 speedLinearNoWeights <- speedlm(formula("target~predictor"), data = sampleData)

 #run linear regression with weights
 linearWithWeights <- lm(formula("target~predictor"), data = sampleData, weights =sampleData[,"w"] )
 speedLinearWithWheights <- speedlm(formula("target~predictor"), data = sampleData, weights =sampleData[,"w"] )
Error in base::crossprod(x, y) : non-conformable arguments
In addition: Warning messages:
1: In sqw * X :
  longer object length is not a multiple of shorter object length
2: In sqw * y :
  longer object length is not a multiple of shorter object length
Called from: base::crossprod(x, y)

Is there any way around this that does not force me to fix the data before running the regression?

1

There are 1 answers

0
Kumar Manglam On

You should try to change the na.action option. Below is your code, which I am able to run, when I change na.action to na.exclude/na.omit.

library(speedglm)
sampleData <- data.frame(w = round(runif(12,0,1)),
                         target = rnorm(12,100,50),
                         predictor = c(NA, rnorm(10, 40, 10),NA))
summary(sampleData)

linearNoWeights <- lm(formula("target~predictor"), data = sampleData)
speedLinearNoWeights <- speedlm(formula("target~predictor"), data = sampleData)

options(na.action="na.exclude") # or "na.omit"

linearNoWeights <- lm(formula("target~predictor"), data = sampleData)
    speedLinearNoWeights <- speedlm(formula("target~predictor"), data = sampleData)

You can go through the documentation for na.omit or na.exclude to understand when to use what. Hope this helps.