R / plm extract residuals by index

2.6k views Asked by At

I have a plm object created using:

require(plm)
plm1 <- plm(Sepal.Length ~ Petal.Length + Petal.Width, data = iris, index = "Species")

I'm trying to extract the residuals to manually calculate r-squared by Species by can't seem to manipulate the pseries object into something useable like a matrix or data.frame.

> data.frame(resid(plm1))
Error in as.data.frame.default(x[[i]], optional = TRUE) : 
  cannot coerce class '"pseries"' into a data.frame

It would be nice if I had something like:

> df1 <- data.frame(time = rep(1:10,15), Species = iris$Species, resid1 = runif(150))
> head(df1)
  time Species    resid1
1    1  setosa 0.7038776
2    2  setosa 0.2164597
3    3  setosa 0.1988884
4    4  setosa 0.9311872
5    5  setosa 0.7087211
6    6  setosa 0.9914357

That I could use ddply or aggregate on to find the rsquared for each Species.

Any suggestions?

2

There are 2 answers

0
dickoa On

May be something along these lines will do the trick

library(plm)
plm1 <- plm(Sepal.Length ~ Petal.Length + Petal.Width, data = iris, index = "Species")
res <- residuals(plm1)
df <- cbind(as.vector(res), attr(res, "index"))
names(df) <- c("resid", "species", "time")
str(df)
## 'data.frame':    150 obs. of  3 variables:
##  $ resid  : num  0.1499 -0.0501 -0.1595 -0.4407 0.0499 ...
##  $ species: Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ time   : Factor w/ 50 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
0
Phil On

This is an old question, but I would like to point out something that is easy to miss and that can result in serious errors. The previous answer by dickoa is correct, but I thought I'd clarify why such a work-around is needed, since it might not be obvious.

While reading another thread I learned the following: As noted here, plm does not necessarily keep data in the same order it was given to the function. This means that simply using the residuals() function on an plm-object and then joining it to your data can lead to the wrong residuals grouped to the wrong row of the data if you are not careful! As an example consider the following:

require(plm)
data("Gasoline") # The Gasoline dataset from the plm package

plm1 <- plm(lgaspcar ~ lincomep + lrpmg + lcarpcap, data=Gasoline, method = "within", index = c("country", "year"))

coef(plm1)
  lincomep      lrpmg   lcarpcap 
 0.6622497 -0.3217025 -0.6404829 

head(residuals(plm1))
          1           2           3           4           5           6 
-0.18814207 -0.19642727 -0.14874420 -0.12476346 -0.12114060 -0.08684045 

Note the residuals that we were given. Now let us just change the order in which the dataset is ordered. This shouldn't change anything in the analysis.

set.seed(1234)
Gasoline2 <- Gasoline[order(runif(nrow(Gasoline))), ] # We just change the order of the rows.

plm2 <- plm(lgaspcar ~ lincomep + lrpmg + lcarpcap, data=Gasoline2, method = "within", index = c("country", "year"))

coef(plm2)
  lincomep      lrpmg   lcarpcap 
 0.6622497 -0.3217025 -0.6404829 

head(residuals(plm2))
        258           7          64          73         268         186 
-0.18814207 -0.19642727 -0.14874420 -0.12476346 -0.12114060 -0.08684045 

At first glance this seems fine; the estimated coefficients are the same as before. However, note that the order that the residuals are presented in are the same as before we shifted the rows around. The only thing that's changed is that the names associated to the residuals now reflect their new position in the data. So the observation that post-reordering is on row 1 in the data, was pre-reordering on row 258.

Gasoline2[1, ]
    country year lgaspcar lincomep     lrpmg  lcarpcap
258  SWEDEN 1970 3.989372 -7.73261 -2.733592 -8.164506

Gasoline[258, ]
    country year lgaspcar lincomep     lrpmg  lcarpcap
258  SWEDEN 1970 3.989372 -7.73261 -2.733592 -8.164506

This means that if we had Gasoline2 as our dataset that we were working with, then using a function like cbind() on Gasoline2 and residuals(plm2) would lead to the wrong residuals being connected to the observations.

head(cbind(Gasoline, residuals(plm1)))
  country year lgaspcar  lincomep      lrpmg  lcarpcap residuals(plm1)
1 AUSTRIA 1960 4.173244 -6.474277 -0.3345476 -9.766840     -0.18814207
2 AUSTRIA 1961 4.100989 -6.426006 -0.3513276 -9.608622     -0.19642727
3 AUSTRIA 1962 4.073177 -6.407308 -0.3795177 -9.457257     -0.14874420
4 AUSTRIA 1963 4.059509 -6.370679 -0.4142514 -9.343155     -0.12476346
5 AUSTRIA 1964 4.037689 -6.322247 -0.4453354 -9.237739     -0.12114060
6 AUSTRIA 1965 4.033983 -6.294668 -0.4970607 -9.123903     -0.08684045

head(cbind(Gasoline2, residuals(plm2)))
     country year lgaspcar  lincomep      lrpmg  lcarpcap residuals(plm2)
258   SWEDEN 1970 3.989372 -7.732610 -2.7335921 -8.164506     -0.18814207
7    AUSTRIA 1966 4.047537 -6.252545 -0.4668377 -9.019822     -0.19642727
64   DENMARK 1966 4.233643 -5.851866 -0.3961885 -8.681541     -0.14874420
73   DENMARK 1975 4.033015 -5.612967 -0.3939543 -8.274632     -0.12476346
268 SWITZERL 1961 4.441330 -6.111640 -0.8655847 -9.158229     -0.12114060
186    JAPAN 1974 4.007964 -5.852553 -0.1909064 -8.846520     -0.08684045

As we see above, the residuals are assigned to the wrong row in the Gasoline2 example.

So what's going on? Well, as alluded to earlier, plm does not preserve the order of the observations. Using the attr() function dickoa pointed out in a previous answer, we can see that plm reorganizes the data by country and year.

head( attr(residuals(plm2), "index") )
  country year
1 AUSTRIA 1960
2 AUSTRIA 1961
3 AUSTRIA 1962
4 AUSTRIA 1963
5 AUSTRIA 1964
6 AUSTRIA 1965

This is how the original Gasoline data was structured, so that's why the residuals are presented in the same order.

We can thus use the fact that attr(residuals(plm2), "index") gives us the residuals and their corresponding country and year indicators, in order to add the residuals to the original data. As pointed out here, the plyr package is very helpful for this.

require(plyr)
resids2 <- data.frame(residual = residuals(plm2), attr(residuals(plm2), "index"))
Gasoline2$year <- factor(Gasoline2$year) # Needed since resids2$year is a factor, and Gasoline2$years was an integer. plyr does not accept them to be of different types.
Gasoline2 <- join(Gasoline2, resids2, by = c("country", "year"))

head(Gasoline2)
   country year lgaspcar  lincomep      lrpmg  lcarpcap    residual
1   SWEDEN 1970 3.989372 -7.732610 -2.7335921 -8.164506 -0.02468148
2  AUSTRIA 1966 4.047537 -6.252545 -0.4668377 -9.019822 -0.02479759
3  DENMARK 1966 4.233643 -5.851866 -0.3961885 -8.681541  0.03175032
4  DENMARK 1975 4.033015 -5.612967 -0.3939543 -8.274632 -0.06575219
5 SWITZERL 1961 4.441330 -6.111640 -0.8655847 -9.158229 -0.05789130
6    JAPAN 1974 4.007964 -5.852553 -0.1909064 -8.846520 -0.21957156

Which gives us the correct result.