So I am trying to replicate a stata function I saw in Priciples of Econometrics, by Hill, Griffiths and Lim. The function I want to replicate looks like this in stata;
lincom _cons + b_1 * [arbitrary value] - c
This is to the null hypothesis H0 : B0 + B1*X = C
I am able to test a hypothesis without the constant, but I would like to add the constant when testing the linear combination of parameters. I browsed the package documentation for glht()
but it only had an example whereby they took out the constant. I replicated the example, keeping the constant, but I am unsure how to test for a linear combination when you have a matrix K, and a constant. For reference, here is their example;
### multiple linear model, swiss data
lmod <- lm(Fertility ~ ., data = swiss)
### test of H_0: all regression coefficients are zero
### (ignore intercept)
### define coefficients of linear function directly
K <- diag(length(coef(lmod)))[-1,]
rownames(K) <- names(coef(lmod))[-1]
K
### set up general linear hypothesis
glht(lmod, linfct = K)
I am not really good at creating pretend data sets, but here is my try.
library(multcomp)
test.data = data.frame(test.y = seq(200,20000,1000),
test.x = seq(10,1000,10))
test.data$test.y = sort(test.data$test.y + rnorm(100, mean = 10000, sd = 100)) -
rnorm(100, mean = 5733, sd = 77)
test.lm = lm(test.y ~ test.x, data = test.data)
# to view the name of the coefficients
coef(test.lm)
# this produces an error. How can I add this intercept?
glht(test.lm,
linfct = c("(Intercept) + test.x = 20"))
It seems that there are two ways to go about this, according to the documentation. I can either use the function diag() to construct a matrix, which I can then use in the linfct =
argument, or I can use a character string. The thing with this method, is that I do not quite know how to use the diag() method while also including the constant (right hand side of the equation); in the case of the character string method, I am not sure how to add the intercept.
Any and all help would be greatly appreciated.
Here is the data I am working with. This was originally in a .dta file, so I apologize for the horrible formatting. Per the book I mentioned above, this is the food.dta file.
structure(list(food_exp = structure(c(115.22, 135.98, 119.34,
114.96, 187.05, 243.92, 267.43, 238.71, 295.94, 317.78, 216,
240.35, 386.57, 261.53, 249.34, 309.87, 345.89, 165.54, 196.98,
395.26, 406.34, 171.92, 303.23, 377.04, 194.35, 213.48, 293.87,
259.61, 323.71, 275.02, 109.71, 359.19, 201.51, 460.36, 447.76,
482.55, 438.29, 587.66, 257.95, 375.73), label = "household food expenditure per week", format.stata = "%10.0g"),
income = structure(c(3.69, 4.39, 4.75, 6.03, 12.47, 12.98,
14.2, 14.76, 15.32, 16.39, 17.35, 17.77, 17.93, 18.43, 18.55,
18.8, 18.81, 19.04, 19.22, 19.93, 20.13, 20.33, 20.37, 20.43,
21.45, 22.52, 22.55, 22.86, 24.2, 24.39, 24.42, 25.2, 25.5,
26.61, 26.7, 27.14, 27.16, 28.62, 29.4, 33.4), label = "weekly household income", format.stata = "%10.0g")), .Names = c("food_exp","income"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -40L))
Let's load the data from your book then check our results as we go along, to make sure we're getting the same thing. I'm presenting the answer to you this way in part because it helped me to understand exactly what you were wanting and in part to convince you of the equivalency here.
Part of the confusion for me was in the syntax of your
lincom
example. Your syntax may have been correct, I have no idea, but based on how it looked I thought you were doing something different, thus referring to your book really helped.First, let's load the data and run the linear model that's on Page 115:
So far, so good. On Pg. 115 of the 4th edition it shows the same regression model other than some minor rounding differences.
Next, the book calculates a point estimate of weekly food expenditure, conditional on household income of 20 (which is $2,000/wk):
Again, we get the exact same result. Incidentally, you can also see the same results in Stata in the nice free sample of the book Using Stata for Principles of Econometrics 4th ed. by Wiley.
Finally, we're ready for the hypothesis testing. As mentioned, I want to make sure I can replicate exactly what Stata had. You kindly provided your code, but I was a little confused about your syntax.
Fortunately, we've gotten lucky. While the preview of the 4th edition Stata guide only goes through Chapter 2, the Economics and Business faculty of a university in Holland were able to get parts of an old edition made available DRM free, so we can refer to that:
and finally see that we can replicate it in R like this:
Don't be fooled by the different output format. The
estimate
it's showing you in the R code is simply the coefficient ("b2") onincome
from the regression model. It doesn't change depending on the hypothesis test, whereas in the Stata output they do "b2 - 15" (which in R ismod$coefficients[2]-15
).What does change are the t (
t value
) and p (Pr(>|t|)
) values. Notice that the these test statistics from R match those from Stata.There was another example with a H0 of income = 7.5 Let's see that the t value is 1.29 and p value is .203 in both R and Stata:
You can also get the confidence intervals with
confint()
.Finally I think you were looking at section 3.6.4 (pg 117) of your book, where an executive wanted to test the hypothesis that, give
income
of 20 ($2000/wk) thefood_exp
is > 250:We can calculate the t-value in R as:
Where the formula is the same as on the preceding 2 pages of the book.
We can even make this into a custom function (works for simple linear regression, meaning 1 independent variable only):