I would like to run a discrete choice analysis with an individual-specific variable and what I think are alternative-specific attribute variables. From the mlogit vignette I think the individual-specific variable is a "choice situation specific covariate" (in the new vignette) and the alternative-specific attribute variables are "alternative specific covariates with generic coefficients" (again, in the new vignette). The alternative-specific attribute variables should not have differing impacts for the different alternatives, so I believe a generic coefficient that applies to all alternatives is in order.
Let's use the Fishing dataset as an example.
library(mlogit)
data(Fishing)
Fish1 <- dfidx(Fishing, varying=2:9, choice="mode", idnames=c("chid", "alt"),
drop.index=F)
Fish1
... which gets us:
~~~~~~~
first 10 observations out of 4728
~~~~~~~
mode income alt price catch chid idx
1 FALSE 7083.332 beach 157.930 0.0678 1 1:each
2 FALSE 7083.332 boat 157.930 0.2601 1 1:boat
3 TRUE 7083.332 charter 182.930 0.5391 1 1:rter
4 FALSE 7083.332 pier 157.930 0.0503 1 1:pier
5 FALSE 1250.000 beach 15.114 0.1049 2 2:each
6 FALSE 1250.000 boat 10.534 0.1574 2 2:boat
7 TRUE 1250.000 charter 34.534 0.4671 2 2:rter
8 FALSE 1250.000 pier 15.114 0.0451 2 2:pier
9 FALSE 3750.000 beach 161.874 0.5333 3 3:each
10 TRUE 3750.000 boat 24.334 0.2413 3 3:boat
And then we fit the model:
(fit1 <- mlogit(mode ~ price+catch | income | 1, data=Fish1))
... which gets us:
Call:
mlogit(formula = mode ~ price + catch | income | 1, data = Fish1, method = "nr")
Coefficients:
(Intercept):boat (Intercept):charter (Intercept):pier price
0.527278790 1.694365710 0.777959401 -0.025116570
catch income:boat income:charter income:pier
0.357781958 0.000089440 -0.000033292 -0.000127577
So far so good.
Now let's recode the price and catch (alternative-specific attribute variables) values to be alternative varying but individual invariant:
Fishing2 <- Fishing
Fishing2$price.beach <- 50
Fishing2$price.pier <- 100
Fishing2$price.boat <- 150
Fishing2$price.charter <- 200
Fishing2$catch.beach <- .2
Fishing2$catch.pier <- .5
Fishing2$catch.boat <- .75
Fishing2$catch.charter <- .87
Fish2 <- dfidx(Fishing2, varying=2:9, choice="mode", idnames=c("chid", "alt"),
drop.index=F)
Fish2
... which gets us:
~~~~~~~
first 10 observations out of 4728
~~~~~~~
mode income alt price catch chid idx
1 FALSE 7083.332 beach 50 0.20 1 1:each
2 FALSE 7083.332 boat 150 0.75 1 1:boat
3 TRUE 7083.332 charter 200 0.87 1 1:rter
4 FALSE 7083.332 pier 100 0.50 1 1:pier
5 FALSE 1250.000 beach 50 0.20 2 2:each
6 FALSE 1250.000 boat 150 0.75 2 2:boat
7 TRUE 1250.000 charter 200 0.87 2 2:rter
8 FALSE 1250.000 pier 100 0.50 2 2:pier
9 FALSE 3750.000 beach 50 0.20 3 3:each
10 TRUE 3750.000 boat 150 0.75 3 3:boat
It seems to me that this is like a one-choice product comparison: each of the alternatives has a fixed set of attributes (alternative-specific attribute variables with generic coefficients) that may influence an individual's decision. The individual's income, the individual-specific (or choice situation-specific, from the new vignette) variable, might affect the decision as well, although it must vary with alternative as shown by the vignette.
BUT, when I try to run the model for the Fish2 dataset, it fails:
fit2 <- mlogit(mode ~ price+catch | income | 1, data=Fish2)
Error in solve.default(H, g[!fixed]) :
system is computationally singular: reciprocal condition number = 3.18998e-23
I'm guessing that the fact that the alternative-specific attribute variables do not vary across choice situations is the problem, but I do not understand why, or how to fix it. It SEEMS to me like I should be able to analyze this situation with mlogit.
The error message you get is often the result of insufficient variation in the data. With insufficient variation the Hessian matrix (negative of the information matrix) becomes singular and cannot be inverted, i.e. you cannot get your standard errors. There are many answers on this particular error message. For example here.
In your second example, if I understand correctly, each alternative is the same for all individuals, which means that you only have four different observations, one for each fishing location. While you observe each many times, you still only have 4 unique observations, but you are trying to fit 8 parameters. This is in all likelihood why your model fails.