First of all: This question might be a duplicate/already solved of/in this stackoverflow post.
I want to use the MatchIt
package to perform fully-blocked matching in my dataset using the Mahalanobis distance. I have two observed covariates (age and sex) that I want to use for matching.
I know that I can perform mahalanobis-based matching using the following arguments:
formula <- as.formula("group ~ sex_boolean + age")
m.out <- matchit(formula=formula,
data=data_df,
distance='mahalanobis')
site_df_matched <- get_matches(m.out,data=data_df)
But this only performs mahalanobis-based matching using the nearest neighbor. What if I want to go even more strict? Is it possible to introduce a caliper to mahalanobis-matching? The idea would be the following: For each unit in the minority group find a unit in the majority group to which the mahalanobis distance is smallest and lies within a defined caliper. If there is no unit from the majority group, the respective unit from the minority group should be discarded.
The outcome should be treatment and control groups of equal sizes containing pairs of units that are close in the respective covariates. The 'closeness' should be controllable by how strict the caliper is set. More strict calipers would lead to more discarded units from the minority group.
Maybe I am also having a false understanding of the mahalanobis-based matching procedure, but is it possible (and recommended) to do this with MatchIt
?
Yes, this is straightforward using
MatchIt
version 4.0.0 and greater. If you want to match on the Mahalanobis distance but include a propensity score caliper, thedistance
argument needs to correspond to the propensity score and themahvars
argument controls on which covariates Mahalanobis distance matching is performed. For example, to perform Mahalanobis distance matching onsex
andage
after estimating a propensity score that contained other variables (e.g.,race
andeduc
) in addition to these two, you would run the following code:If you want to perform Mahalanobis distance matching without involving a propensity score, the code below accomplishes that:
If you need to estimate a propensity score for any reason (e.g., a caliper or common support), you must use the first syntax. If no propensity score is involved, the second syntax works. You can still place calipers on the pairs with the second syntax as long as the calipers are on other supplied variables; for example, to place a caliper of .25 standard deviations of
age
, you could entercaliper = c(age = .25)
. You can place calipers on multiple variables at a time, including the propensity score if the first syntax is used.This is all detailed in the help page for nearest neighbor matching, which can be reviewed here or with
?method_nearest
.