I have case patients and am trying to match controls based of age (easy part) and ICD10 using the MatchIt package in R. My problem is that there are multiple ICD codes for a given patient. For example, a 20 year old case case patient may have 2 ICD codes and I want to find a control patient who is also 20 years old and has at least the same two ICD10 codes (the control patient may have more ICD10 codes which is fine).
patient_id diagnosis age status
<dbl> <chr> <dbl> <chr>
1 1001 Z34 20 case
2 1001 A24 20 case
3 1002 N39 22 case
4 1002 Z3A 22 case
5 1003 N89 23 case
6 1003 Z34 23 case
7 1004 Z34 20 control
8 1004 A24 20 control
9 1005 D50 23 control
10 1005 F41 23 control
11 1005 N89 23 control
12 1005 Z11 23 control
13 1005 Z34 23 control
14 1006 Z12 22 control
15 1006 Z34 22 control
16 1006 N39 22 control
17 1007 E66 20 control
18 1007 Z11 20 control
19 1007 Z12 20 control
20 1007 Z34 20 control
Here is what I have tried:
library(MatchIt)
library(dplyr)
m.out <- matchit(I(status == "case") ~ age, data = df,
exact = ~age + diagnosis,
method = "optimal",
distance = "glm", ratio = 1)
m.data <- match.data(m.out, subclass = "matched_id")
print(m.data)
patient_id diagnosis age status distance weights matched_id
<dbl> <chr> <dbl> <chr> <dbl> <dbl> <fct>
1 1001 Z34 20 case 0.269 1 1
2 1001 A24 20 case 0.269 1 2
3 1002 N39 22 case 0.308 1 3
4 1003 N89 23 case 0.329 1 4
5 1003 Z34 23 case 0.329 1 5
6 1004 A24 20 control 0.269 1 2
7 1005 N89 23 control 0.329 1 4
8 1005 Z34 23 control 0.329 1 5
9 1006 N39 22 control 0.308 1 3
10 1007 Z34 20 control 0.269 1 1
As you can see, patient 1001 matched with 1007, but also matched with 1004. I only want 1001 to match with 1004 since they are both 20 years old and have the ICD codes Z34 & A24. Any help would be much appreciated.
I think the problem is that your dataset is in long format, so
matchit
will try to make a match for each row. The solution is to reshape the data to be wide and dummy code all of the diseases, then match on that. Keep in mind that you're not going to be able to match every row if you're requesting an exact match on disease.Created on 2023-07-11 with reprex v2.0.2
I've assumed this is a typo, and you mean 1007.
Also, it's a good idea when you post your question to make it easy for others to copy the data. The
dput()
command can be useful for this.