Coloring plotly's boxplot box lines by factor

244 views Asked by At

I have a data.frame with two factor variables (type and age in df below) and a single numeric variable (value in df) that I'd like to plot using R's plotly package as a grouped boxplot.

Here's the data.frame:

set.seed(1)
df <- data.frame(type = c(rep("t1", 1000), rep("t2", 1000), rep("t3", 1000), rep("t4", 1000), rep("t5", 1000), rep("t6", 1000)),
                 age = rep(c(rep("y", 500),rep("o", 500)), 6),
                 value = rep(c(runif(500, 5, 10), runif(500, 7.5, 12.5)), 6),
                 stringsAsFactors = F)
df$age <- factor(df$age, levels = c("y", "o"), ordered = T)

Here's how I'm currently plotting it:

library(plotly)
library(dplyr)
plot_ly(x = df$type, y = df$value, name = df$age, color = df$type, type = "box", showlegend = F) %>%
  layout(yaxis = list(title = "Diversity"), boxmode = "group", boxgap = 0, boxgroupgap = 0)

Which gives: enter image description here

My question is whether it is possible to color the lines of the boxes by df$age?

I know that for coloring all the boxes with a single color (e.g., #AFB1B5) I can add to the plot_ly function:

line = list(color = "#AFB1B5")

But that would color all box lines similarly whereas what I'm trying to do is to color them differently by df$age.

1

There are 1 answers

0
Kat On

There is a way to do this that's not that too complicated, but rather ugly. Or something that is over the top cumbersome (I didn't realize how far I was digging until I was done...)

Before I go too far... I noticed that there is a ton of white space and that you have gaps set to zero. You can add the parameter offsetgroup and get rid of a lot more whitespace. Check it out:

plot_ly(data = df,
        x = ~type, y = ~value, name = ~age, offsetgroup = ~type, # <- I'm new!
        color = ~type, type = "box", showlegend = F) %>%
  layout(yaxis = list(title = "Diversity"), 
         boxmode = "group", boxgap = 0, boxgroupgap = 0)

enter image description here

With the not-too-complicated-but-kind-of-ugly method

The line is the box outline, the median line, the lines from Q1 to the lower fence, from Q3 to the upper fence, and the whiskers.

I assigned the plot to the object plt for this code. When I checked the object, it didn't have the data element, so I built the plot next.

plt <- plotly_build(plt)

Then I added colors with lapply.

# this looks ugly!
lapply(1:12,
       function(i){
         nm = plt$x$data[[i]]$name
         cr = ifelse(nm == "o",
                     "#66FF66", "black")
         plt$x$data[[i]]$line$color <<- cr  # change graph by age
       }
)
plt

enter image description here

With the ridiculous-amount-of-code-for-a-few-lines-but-looks-better method

I guess it isn't a few lines. It's 48 lines.

For this method, you need to build the plot like I did in the before (plotly_build), so that the data element is in the plt object.

Then you have to determine the first and third quantile, the IQR, the max and min values between the quantiles and 1.5 * IQR for each type and age grouping so that you have the y values for the lines.

I wrote a function to get the upper and lower fences.

fen <- function(vals){
  iq = 1.5 * IQR(vals)
  q3 = quantile(vals, 3/4)              # top of the box
  uf = q3 + iq                          # top of the fence
  vt = max(vals[vals > q3 & vals < uf]) # max value in range
  q1 = quantile(vals, 1/4)              # btm of the box
  bf = q1 - iq                          # btm of the fence
  vb = min(vals[vals < q1 & vals > bf]) # min value in range
  sz = function(no){
    if(length(no) > 1) {no = no[[1]]}
    return(no)
  }
  vt = sz(vt)
  vb = sz(vb)
  return(list(vt, vb))
}

Then I used this function and the data to determine the remaining values needed to draw the lines.

df1 <- df %>% 
  # have to reverse the order or it won't line up
  mutate(age = factor(age, levels = c("o", "y"), ordered = T)) %>% 
  group_by(type, age) %>% 
  summarise(ufen = fen(value)[[1]],     # top of the fence
            q3 = quantile(value, 3/4),  # top of the box
            q1 = quantile(value, 1/4),  # btm of the box
            dfen = fen(value)[[2]])     # btm of the fence

To plot these new lines, I used shapes which is equivalent to ggplot2 annotations. (annotations in Plotly is primarily for text.)

There are several steps to drawing these lines. First I've started with some things that are essentially the same in every line. After that is a vector that helps place the lines on the x-axis.

# line shape basics; the same for every line
tellMe <- function(shade){
  list(type = "line",
       line = list(color = shade),
       xref = "paper",
       yref = "y")
}

# setup for placing lines on the x-axis; these are % of space
xers = c(rep(.0825, 4), rep(.083 * 3, 4), rep(.083 * 5, 4))

Now four lapply statements: the upper fences, the lower fences, the upper whiskers, and the lower whiskers.

lns <- lapply(1:12,
              function(i) {                     # upper fence lines
                nm = ifelse(df1[i, ]$age == "o",
                            "#66FF66", "black")
                xb = 1/12 * (i - 1)
                xn = xb + (1/6 * xers[[i]])
                more = tellMe(nm)
                c(x0 = xn, x1 = xn,
                  y0 = df1[i, ]$q3[[1]], # it's named; this makes it val only
                  y1 = df1[i, ]$ufen, more)
              })
mlns <- lapply(1:12,
               function(i) {                    # lower fence lines
                 nm = ifelse(df1[i, ]$age == "o",
                             "#66FF66", "black")
                 xb = 1/12 * (i - 1)
                 xn = xb + (1/6 * xers[[i]])
                 more = tellMe(nm)
                 c(x0 = xn, x1 = xn,
                   y0 = df1[i, ]$q1[[1]], # it's named; this makes it val only
                   y1 = df1[i, ]$dfen, more)
               })

# default whisker width is 1/2 the width of the box
# current boxes of 1/4 of the space by type
# with domain [0, 1], the box width is 1/12 * .5, so
# the whisker width is
ww = 1/12 * .5 *.5
# already have the center, so half on each side...
ww = ww * .5

wwlns <- lapply(1:12,
              function(i) {                     # upper fence whisker
                nm = ifelse(df1[i, ]$age == "o",
                            "#66FF66", "black")
                xb = 1/12 * (i - 1)
                xn = xb + (1/6 * xers[[i]])
                more = tellMe(nm)
                c(x0 = xn - ww, x1 = xn + ww,
                  y0 = df1[i, ]$ufen, y1 = df1[i, ]$ufen,
                  more)
              })
wwm <- lapply(1:12,
               function(i) {                     # lower fence whisker
                 nm = ifelse(df1[i, ]$age == "o",
                             "#66FF66", "black")
                 xb = 1/12 * (i - 1)
                 xn = xb + (1/6 * xers[[i]])
                 more = tellMe(nm)
                 c(x0 = xn - ww, x1 = xn + ww,
                   y0 = df1[i, ]$dfen, y1 = df1[i, ]$dfen,
                   more)
               })

Now you have to concatenate the lists and add them to the plot.

# combine shapes
shp <- append(lns, mlns)
shp <- append(shp, wwlns)
shp <- append(shp, wwm)

plt %>% layout(shapes = shp)

There are OBVIOUSLY better color choices out there.

enter image description here