group ids some common some dissimilar R

66 views Asked by At

I want to make an ID for group members that is sometimes common and sometimes dissimilar. I have a few cases where dissimilarity is needed, so I can specify those manually in the code or as a list.

Imagine data like this:

data <- data.frame(
  Group = c("Group1", "Group1", "Group2", "Group2", "Group2", "Group3", "Group3", "Group4", "Group4", "Group4"),
  Member = c("Alice", "Bob", "Charlie", "David", "Eve", "Frank", "Grace", "Helen", "Ivy", "John")
)

Basically, I want something where, say, Group 1 has a common ID so Alice = 1, Bob = 1, but then group 2 has globally unique IDs that vary within the group, so Charlie =2, David = 3, and Eve = 4, but then we switch back to similar for Group 3.

Final ideal data would hypothetically look like this:

data <- data.frame(
  Group = c("Group1", "Group1", "Group2", "Group2", "Group2", "Group3", "Group3", "Group4", "Group4", "Group4"),
  Member = c("Alice", "Bob", "Charlie", "David", "Eve", "Frank", "Grace", "Helen", "Ivy", "John"),
ID = c(1,1,2,3,4,5,5,6,6,6)
)

I am most familiar with dplyr, and so have been experimenting with

Data %>%
group_by(Group)%>%
mutate(ID = row_number())

And variations thereof including case_when and ifelse statements.

Ideal outcome would be code that uses case_when or ifelse in the mutate to let me specify n groups that should have different within group IDs and let something like the TRUE~ portion of the case when assign globally unique common IDs to the rest of the groups.

2

There are 2 answers

4
jared_mamrot On BEST ANSWER

Perhaps this would suit your use-case?

library(dplyr, warn = FALSE)

data <- data.frame(
  Group = c("Group1", "Group1", "Group2", "Group2", "Group2", "Group3", "Group3", "Group4", "Group4", "Group4"),
  Member = c("Alice", "Bob", "Charlie", "David", "Eve", "Frank", "Grace", "Helen", "Ivy", "John")
)

data %>%
  mutate(Group_number = row_number(), .by = Group) %>%
  mutate(tmp = case_when(Group == "Group2" ~ 1,
                         Group == "Group4" ~ 1,
                         Group_number == 1 ~ 1,
                         TRUE ~ 0)) %>%
  mutate(ID = cumsum(tmp)) %>%
  select(-c(Group_number, tmp))
#>     Group  Member ID
#> 1  Group1   Alice  1
#> 2  Group1     Bob  1
#> 3  Group2 Charlie  2
#> 4  Group2   David  3
#> 5  Group2     Eve  4
#> 6  Group3   Frank  5
#> 7  Group3   Grace  5
#> 8  Group4   Helen  6
#> 9  Group4     Ivy  7
#> 10 Group4    John  8

Created on 2023-10-18 with reprex v2.0.2


EDIT: based on the comment below, only Group2 should be 'dissimilar':

library(dplyr, warn = FALSE)

data <- data.frame(
  Group = c("Group1", "Group1", "Group2", "Group2", "Group2", "Group3", "Group3", "Group4", "Group4", "Group4"),
  Member = c("Alice", "Bob", "Charlie", "David", "Eve", "Frank", "Grace", "Helen", "Ivy", "John")
)

data %>%
  mutate(Group_number = row_number(), .by = Group) %>%
  mutate(tmp = case_when(Group == "Group2" ~ 1,
                        # Group == "Group4" ~ 1,
                         Group_number == 1 ~ 1,
                         TRUE ~ 0)) %>%
  mutate(ID = cumsum(tmp)) %>%
  select(-c(Group_number, tmp))
#>     Group  Member ID
#> 1  Group1   Alice  1
#> 2  Group1     Bob  1
#> 3  Group2 Charlie  2
#> 4  Group2   David  3
#> 5  Group2     Eve  4
#> 6  Group3   Frank  5
#> 7  Group3   Grace  5
#> 8  Group4   Helen  6
#> 9  Group4     Ivy  6
#> 10 Group4    John  6

Created on 2023-10-18 with reprex v2.0.2

1
Mark On

Here's another way:

# set the ones you want to be different, and then set their names to be the groups in the Group column
different = c(FALSE, TRUE, FALSE, FALSE) |> setNames(unique(data$Group))

# find rows which are part of a Group where different = TRUE or alternatively, ones where a new group has been started
data |> mutate(ID = cumsum(different[Group] | Group != lag(Group, default = "")))

Note: If your groups are literally the same as in the example, it's probably better to remove the "Group" part and turn them into integers