I have a system of n dummy variables/set indicators. I want to automatically identify any nesting. Take a simple n = 3 case.
expand.grid(0:1, 0:1, 1)
Var1 Var2 Var3
0 0 1
1 0 1
0 1 1
1 1 1
E.g. Here Var1 and Var2 are not nested ("incomparable"), but both are nested (subsets of) in Var3. This is obviously a partially ordered set (ordered by the relation =<) so I would like to automatically identify the poset and plot its hasse diagram automatically. I see there are some packages for hasse diagrams but they seem like to not do this common and simple job very naturally. Is there an easy way?
The following begins to do the job but is hardly elegant (thanks to help from @danlooo below):
find_subset_poset <- \(data){
is_nested <- \(df, col1, col2){
all(pull(df, ensym(col1)) <= pull(df, ensym(col2)))
}
get_all_pairs = \(data){
data %>%
names() %>%
combinat::combn(2,) %>%
t() %>%
as_tibble(.name_repair = "unique") %>%
set_names(c("col1", "col2"))
}
all_pairs = get_all_pairs(data)
is_eq = \(df, col1, col2){
all(pull(df, ensym(col1)) == pull(df, ensym(col2)))
}
equal_indicator_cols = all_pairs %>% filter(pmap_lgl(., is_eq, df = data))
# UNITE EQUAL COLUMNS
while(nrow(equal_indicator_cols) > 0){
sames = unlist(equal_indicator_cols[1, ])
message("The following indicator columns are equal: ", sames)
y = c(unlist(data[sames[1]]))
# REMOVE DUPLICATE SETS
data[sames] <- NULL
nm <- sym(str_c(sames, collapse = ","))
# ADD A SINGLE REPLACEMENT
data = data %>% mutate(!!nm := y)
# RECURSIVELY LOOK FOR FURTHER MATCHES
all_pairs = get_all_pairs(data)
equal_indicator_cols = all_pairs %>% filter(pmap_lgl(., is_eq, df = data))
}
all_pairs %>%
pmap(\(col1, col2) count(data, !!ensym(col1), !!ensym(col2))) %>%
map(print)
all_ordered_pairs_of_non_equal_indicators =
all_pairs %>%
rowwise() %>%
group_split() %>%
map(combinat::permn) %>%
map_depth(2, ~set_names(.x, c("col1", "col2"))) %>%
flatten() %>%
reduce(rbind)
all_ordered_pairs_of_non_equal_indicators %>%
filter(pmap_lgl(., is_nested, df = data)) %>%
select(col1, col2) %>%
as.matrix() %>%
POSetR::poset(elements = names(data)) %>%
plot()
}
Let data be a table of indicator variables w.l.o.g. e.g. col1 and col2 describing a poset with implicit ordering <=. col1 is nested in col2 if and only if they are not crossed. col1 and col2 are crossed if and only if all possible pairs of the unique values of col 1 and col2 (here 0 and 1) are present as rows in data.
Created on 2022-03-16 by the reprex package (v2.0.0)