Multiple points with same label in ggrepel - avoid redundant labels

157 views Asked by At

I want to annotate a scatterplot with several points that have the same label. I want to label all of them (not just a part of them) but it is a mess with so many redundant labels. Is there any way to have one single label pointing to all the points with the same label with ggrepel::geom_text_repel?

I attach the simplest possible situation:

df <- data.frame(
  group = c("A", "A", "B", "B"),
  x = c(1, 2, 3, 4),
  y = c(2, 3, 4, 5)
)
ggplot(df, aes(x, y)) +
  geom_point() +
  geom_text_repel(data=df, aes(label=group), box.padding = 0.5, force = 4)

Now: enter image description here

What I want: enter image description here

PS: @user2862862 posted the same question in 2019 but there was no proper answer in One label for multiple points

1

There are 1 answers

1
L Tyrone On BEST ANSWER

Here's an approach to achieve what you want. However, as @AllanCameron alluded to in the comments, the appropriateness of this approach is heavily data dependent. I have included extra examples to illustrate potential issues.

This method involves computing the mean xy for each group, then creating two more dataframes: one for the lines (df1), and one for the labels (df2):

library(dplyr)
library(tidyr)
library(ggplot2)

# Your example data (Example1): IMPORTANT: note modified x and y column names,
# you will first need to change your x and y columns to x_1 and y_1
df <- data.frame(
  group = c("A", "A", "B", "B"),
  x_1 = c(1, 2, 3, 4),
  y_1 = c(2, 3, 4, 5)
)

# Create df for plotting lines from original points to group's mean point
df1 <- df %>%
  group_by(group) %>%
  mutate(x_2 = mean(x_1),
         y_2 = mean(y_1)) %>%
  pivot_longer(-group,
               names_to = c(".value", "var"),
               names_sep = "_")

# Create df for single group label
df2 <- df %>%
  group_by(group) %>%
  mutate(x_2 = mean(x_1),
         y_2 = mean(y_1)) %>%
  select(-ends_with("1")) %>%
  distinct()

# Plot
ggplot() +
  geom_path(data = df1,
            aes(x, y, group = group),
            colour = "grey") +
  geom_point(data = df,
             aes(x_1, y_1),
             size = 2) +
  geom_text(data = df2,
            aes(x_2, y_2, label = group),
            size = 5)

Now consider these two other example dataframes:

# Example2
df <- data.frame(
  group = rep(c("A", "B"), each = 3),
  x_1 = c(1, 1.5, 2, 3, 3.5, 4),
  y_1 = c(2, 4, 3, 4, 2.5, 5)
)

# Example3
set.seed(1)
df <- data.frame(group = rep(c("A", "B", "C"), each = 10),
                 x_1 = runif(n = 30, min = 1, max = 4),
                 y_1 = runif(n = 30, min = 2, max = 5))

first examples

Example1 and Example2 look ok, but your proposed method doesn't scale nicely for data like Example3. Lines for each group cross and this makes it difficult to interpret. If your full data are more complex and contain lots of points like Example3, using colour (or shape) is much more effective at communicating what is going on in your data:

ggplot() +
  geom_point(data = df,
             aes(x_1, y_1, colour = group),
             size = 2) +
  labs(x = "x", y = "y")

colour example