Hierarchical clustering with constraints

45 views Asked by At

I am trying to create 10 clusters using a segment variable while simultaneously meeting a criteria that the total value of a bound variable per cluster should at least be 10,000. Here's what I have done so far:

# Prepare data
set.seed(20240124)
data <- tigris::counties(
  state = "IA", cb = TRUE, year = 2020, progress_bar = FALSE
) |>
  dplyr::select(GEOID) |>
  dplyr::mutate(
    segment_var = rexp(99),
    bound_var = rexp(99, rate = 0.0001)
  )
dplyr::glimpse(data)
#> Rows: 99
#> Columns: 4
#> $ GEOID       <chr> "19075", "19149", "19117", "19025", "19111", "19163", "191…
#> $ geometry    <MULTIPOLYGON [°]> MULTIPOLYGON (((-93.02679 4..., MULTIPOLYGON …
#> $ segment_var <dbl> 1.40702949, 3.06134427, 3.01756730, 0.91757114, 0.82408377…
#> $ bound_var   <dbl> 7625.26661, 2171.95732, 2688.26497, 369.21961, 4603.78202,…

# Find clusters
clusters <- data |>
  sf::st_drop_geometry() |>
  dplyr::select(segment_var) |>
  dplyr::mutate(segment_var = scale(segment_var) |> as.vector()) |>
  dist() |>
  hclust() |>
  cutree(k = 10)

# Augment data with cluster info
data_clustered <- data |>
  dplyr::mutate(clust_id = as.factor(clusters))
dplyr::glimpse(data_clustered)
#> Rows: 99
#> Columns: 5
#> $ GEOID       <chr> "19075", "19149", "19117", "19025", "19111", "19163", "191…
#> $ geometry    <MULTIPOLYGON [°]> MULTIPOLYGON (((-93.02679 4..., MULTIPOLYGON …
#> $ segment_var <dbl> 1.40702949, 3.06134427, 3.01756730, 0.91757114, 0.82408377…
#> $ bound_var   <dbl> 7625.26661, 2171.95732, 2688.26497, 369.21961, 4603.78202,…
#> $ clust_id    <fct> 1, 2, 2, 3, 3, 4, 3, 5, 5, 6, 7, 8, 7, 8, 3, 8, 6, 7, 3, 8…

# Get the summary of bound variable by cluster
data_clustered |>
  sf::st_drop_geometry() |>
  dplyr::summarize(
    sum_bound_var = sum(bound_var),
    .by = clust_id
  ) |>
  dplyr::pull(sum_bound_var) |>
  range()
#> [1]   4860.222 234915.239

Created on 2024-01-24 with reprex v2.0.2

How can I force this additional constraint on the bound variable when producing clusters?

0

There are 0 answers