I have a data.frame
with two factor
variables (type
and age
in df
below) and a single numeric
variable (value
in df
) that I'd like to plot using R
's plotly
package as a grouped boxplot
.
Here's the data.frame
:
set.seed(1)
df <- data.frame(type = c(rep("t1", 1000), rep("t2", 1000), rep("t3", 1000), rep("t4", 1000), rep("t5", 1000), rep("t6", 1000)),
age = rep(c(rep("y", 500),rep("o", 500)), 6),
value = rep(c(runif(500, 5, 10), runif(500, 7.5, 12.5)), 6),
stringsAsFactors = F)
df$age <- factor(df$age, levels = c("y", "o"), ordered = T)
Here's how I'm currently plotting it:
library(plotly)
library(dplyr)
plot_ly(x = df$type, y = df$value, name = df$age, color = df$type, type = "box", showlegend = F) %>%
layout(yaxis = list(title = "Diversity"), boxmode = "group", boxgap = 0, boxgroupgap = 0)
My question is whether it is possible to color the lines of the boxes by df$age
?
I know that for coloring all the boxes with a single color (e.g., #AFB1B5
) I can add to the plot_ly
function:
line = list(color = "#AFB1B5")
But that would color all box lines similarly whereas what I'm trying to do is to color them differently by df$age
.
There is a way to do this that's not that too complicated, but rather ugly. Or something that is over the top cumbersome (I didn't realize how far I was digging until I was done...)
Before I go too far... I noticed that there is a ton of white space and that you have gaps set to zero. You can add the parameter
offsetgroup
and get rid of a lot more whitespace. Check it out:With the not-too-complicated-but-kind-of-ugly method
The line is the box outline, the median line, the lines from Q1 to the lower fence, from Q3 to the upper fence, and the whiskers.
I assigned the plot to the object
plt
for this code. When I checked the object, it didn't have the data element, so I built the plot next.Then I added colors with
lapply
.With the ridiculous-amount-of-code-for-a-few-lines-but-looks-better method
I guess it isn't a few lines. It's 48 lines.
For this method, you need to build the plot like I did in the before (
plotly_build
), so that the data element is in theplt
object.Then you have to determine the first and third quantile, the IQR, the max and min values between the quantiles and 1.5 * IQR for each type and age grouping so that you have the y values for the lines.
I wrote a function to get the upper and lower fences.
Then I used this function and the data to determine the remaining values needed to draw the lines.
To plot these new lines, I used
shapes
which is equivalent toggplot2
annotations. (annotations
in Plotly is primarily for text.)There are several steps to drawing these lines. First I've started with some things that are essentially the same in every line. After that is a vector that helps place the lines on the x-axis.
Now four
lapply
statements: the upper fences, the lower fences, the upper whiskers, and the lower whiskers.Now you have to concatenate the lists and add them to the plot.
There are OBVIOUSLY better color choices out there.