I have case patients and am trying to match controls based of age (easy part) and ICD10 using the MatchIt package in R. My problem is that there are multiple ICD codes for a given patient. For example, a 20 year old case case patient may have 2 ICD codes and I want to find a control patient who is also 20 years old and has at least the same two ICD10 codes (the control patient may have more ICD10 codes which is fine).

  patient_id diagnosis   age status 
        <dbl> <chr>     <dbl> <chr>  
 1       1001 Z34          20 case   
 2       1001 A24          20 case   
 3       1002 N39          22 case   
 4       1002 Z3A          22 case   
 5       1003 N89          23 case   
 6       1003 Z34          23 case   
 7       1004 Z34          20 control
 8       1004 A24          20 control
 9       1005 D50          23 control
10       1005 F41          23 control
11       1005 N89          23 control
12       1005 Z11          23 control
13       1005 Z34          23 control
14       1006 Z12          22 control
15       1006 Z34          22 control
16       1006 N39          22 control
17       1007 E66          20 control
18       1007 Z11          20 control
19       1007 Z12          20 control
20       1007 Z34          20 control

Here is what I have tried:

library(MatchIt)
library(dplyr)

m.out <- matchit(I(status == "case") ~ age, data = df,
                 exact = ~age + diagnosis,
                 method = "optimal",
                 distance = "glm", ratio = 1)

m.data <- match.data(m.out, subclass = "matched_id")

print(m.data)
   patient_id diagnosis   age status  distance weights matched_id
        <dbl> <chr>     <dbl> <chr>      <dbl>   <dbl> <fct>     
 1       1001 Z34          20 case       0.269       1 1         
 2       1001 A24          20 case       0.269       1 2         
 3       1002 N39          22 case       0.308       1 3         
 4       1003 N89          23 case       0.329       1 4         
 5       1003 Z34          23 case       0.329       1 5         
 6       1004 A24          20 control    0.269       1 2         
 7       1005 N89          23 control    0.329       1 4         
 8       1005 Z34          23 control    0.329       1 5         
 9       1006 N39          22 control    0.308       1 3         
10       1007 Z34          20 control    0.269       1 1   

As you can see, patient 1001 matched with 1007, but also matched with 1004. I only want 1001 to match with 1004 since they are both 20 years old and have the ICD codes Z34 & A24. Any help would be much appreciated.

1

There are 1 answers

2
Taren Sanders On

I think the problem is that your dataset is in long format, so matchit will try to make a match for each row. The solution is to reshape the data to be wide and dummy code all of the diseases, then match on that. Keep in mind that you're not going to be able to match every row if you're requesting an exact match on disease.

library(MatchIt)
library(dplyr, warn.conflicts = FALSE)

df <- tibble::tribble(
  ~patient_id, ~diagnosis, ~age, ~status,
  1001, "Z34", 20, "case",
  1001, "A24", 20, "case",
  1002, "N39", 22, "case",
  1002, "Z3A", 22, "case",
  1003, "N89", 23, "case",
  1003, "Z34", 23, "case",
  1004, "Z34", 20, "control",
  1004, "A24", 20, "control",
  1005, "D50", 23, "control",
  1005, "F41", 23, "control",
  1005, "N89", 23, "control",
  1005, "Z11", 23, "control",
  1005, "Z34", 23, "control",
  1006, "Z12", 22, "control",
  1006, "Z34", 22, "control",
  1006, "N39", 22, "control",
  1007, "E66", 20, "control",
  1007, "Z11", 20, "control",
  1007, "Z12", 20, "control",
  1007, "Z34", 20, "control"
)

df_wide <- tidyr::pivot_wider(df, names_from = diagnosis, values_from = diagnosis, values_fn = length, values_fill = 0)

df_wide
#> # A tibble: 7 × 13
#>   patient_id   age status    Z34   A24   N39   Z3A   N89   D50   F41   Z11   Z12
#>        <dbl> <dbl> <chr>   <int> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1       1001    20 case        1     1     0     0     0     0     0     0     0
#> 2       1002    22 case        0     0     1     1     0     0     0     0     0
#> 3       1003    23 case        1     0     0     0     1     0     0     0     0
#> 4       1004    20 control     1     1     0     0     0     0     0     0     0
#> 5       1005    23 control     1     0     0     0     1     1     1     1     0
#> 6       1006    22 control     1     0     1     0     0     0     0     0     1
#> 7       1007    20 control     1     0     0     0     0     0     0     1     1
#> # ℹ 1 more variable: E66 <int>

m.out <- matchit(I(status == "case") ~ age,
  data = df_wide,
  exact = ~ age - patient_id,
  method = "optimal",
  distance = "glm", ratio = 1
)

m.data <- match.data(m.out, subclass = "matched_id")

print(m.data)
#> # A tibble: 6 × 16
#>   patient_id   age status    Z34   A24   N39   Z3A   N89   D50   F41   Z11   Z12
#>        <dbl> <dbl> <chr>   <int> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1       1001    20 case        1     1     0     0     0     0     0     0     0
#> 2       1002    22 case        0     0     1     1     0     0     0     0     0
#> 3       1003    23 case        1     0     0     0     1     0     0     0     0
#> 4       1005    23 control     1     0     0     0     1     1     1     1     0
#> 5       1006    22 control     1     0     1     0     0     0     0     0     1
#> 6       1007    20 control     1     0     0     0     0     0     0     1     1
#> # ℹ 4 more variables: E66 <int>, distance <dbl>, weights <dbl>,
#> #   matched_id <fct>

Created on 2023-07-11 with reprex v2.0.2

I only want 1001 to match with 1004 since they are both 20 years old and have the ICD codes Z34 & A24.

I've assumed this is a typo, and you mean 1007.

Also, it's a good idea when you post your question to make it easy for others to copy the data. The dput() command can be useful for this.