Assume the following data:
library(cobalt)
library(dplyr)
set.seed(123)
# Extending lalonde data by `group` variable:
lalonde <- cbind(lalonde,
group = sample(c("A", "B", "AB"), size = 614, replace = TRUE))
# Creating variable `set` for AB group:
lalonde$set[lalonde$group == "AB"] <- sample(c(5, 10, 15, 3), sum(lalonde$group == "AB"), replace = TRUE)
# Filling 'set' column for `group` 'A' and 'B':
prob_80 <- sample(c(0, 1), nrow(lalonde), replace = TRUE, prob = c(0.8, 0.2))
lalonde$set[(lalonde$group == "A" | lalonde$group == "B") & prob_80 == 1] <- sample(c(5, 10, 15, 3), sum((lalonde$group == "A" | lalonde$group == "B") & prob_80 == 1), replace = TRUE)
lalonde$set[(lalonde$group == "A" | lalonde$group == "B") & is.na(lalonde$set)] <- sample(1:100, sum((lalonde$group == "A" | lalonde$group == "B") & is.na(lalonde$set)), replace = TRUE)
Now we have a group
variable containing either A
, B
, AB
and a variable named set
.
Now I want to fit a logistic PS model stratified by the set
values of group == "AB"
, predicting being in group AB
.
First I would pull the distinct values of set
, being in group == AB
.
unique_set_values <- unique(lalonde[lalonde$group == "AB", "set"]) %>%
print()
which are:
+ print()
[1] 5 15 10 3
I use those to get all observations which belong to one of the set
values:
filtered_data <- lalonde %>%
filter(set %in% unique_set_values)
Then I split the data and replace AB
with 1
, else 0
:
# For AB and A:
AB_A <- filtered_data %>%
filter(group %in% c("AB", "A")) %>%
mutate(group = ifelse(group == "AB", 1, 0))
# For AB and B:
AB_B <- filtered_data %>%
filter(group %in% c("AB", "B")) %>%
mutate(group = ifelse(group == "AB", 1, 0))
Now I can calculate the stratified PS for AB
and A
as well as AB
and B
:
# Creating a formula:
formula <- group ~ age + educ + race + married + nodegree + re74 + re75 + re78
But how to calculate the PS stratified by set
in such scenario?
I tried:
AB_A_PS <- AB_A %>%
group_by(set) %>%
mutate(pscore = glm(formula, data = ., family = binomial(link = "logit"))$fitted.values)
but all I get is an error:
Error in `mutate()`:
ℹ In argument: `pscore = predict(glm(formula, data = ., family = binomial(link = "logit")))`.
ℹ In group 1: `set = 3`.
Caused by error:
! `pscore` must be size 66 or 1, not 238.
So, obviously, it does not work.
Thank you
You are passing the whole grouped data frame to
glm
for each group within the data frame, hence the error. Instead, you can pass the subset of the data frame that contains only the rows in the current group: