Intro:
I'm running into trouble plotting some errorbars on a grouped barplot.
I'm trying to adapt some code that was originally used for a non-grouped barplot that I used to make the following figure:
The Problem:
Now I am trying to plot multiple air pollutants for each site.
I am melting two separate dataframes (one with mean values, one with confidence intervals), and then joining them together. I've successfully made a grouped barplot, however the errorbars look crazy!
How can I correctly map my errorbars so they mimic the non-grouped barplot above?
Reproducible Example:
See my entire data provenance below:
## mean values generated from raw data for each pollutant by site:
df.mean <- structure(list(id = structure(1:5, .Label = c("A", "B", "C", "D", "E"), class = "factor"), co_mean = c(0.00965315315315315, 0.201591548253404, 0.180300223214286, 0.14681828358209, 0.136609422703303), no_mean = c(2.09379071379071, 7.17386693309651, 5.11211979166667, 7.070375, 8.84492922564529), no2_mean = c(2.90698198198198, 15.3616940497336, 14.4540014880952, 17.8782126865672, 9.94047529836248), o3_mean = c(0.848970893970894, 19.6143709295441, 18.0919508928571, 19.1743544776119, 23.300829170136)), class = c("tbl_df", "tbl", "data.frame"), .Names = c("id", "co_mean", "no_mean", "no2_mean", "o3_mean"), row.names = c(NA, -5L))
## confidence intervals generated from raw data for each pollutant by site:
df.ci <- structure(list(id = structure(1:5, .Label = c("A", "B", "C", "D", "E"), class = "factor"), co_ci = c(0.00247560132518893, 0.00347796717254879, 0.00376771895817099, 0.025603853701267, 0.00232362415184514), no_ci = c(0.955602056071903, 0.179936357209358, 0.166243603959864, 0.413094097187208, 0.20475667069271), no2_ci = c(0.975169763947207, 0.251717055459865, 0.230073674418165, 0.479358833879918, 0.148588790912564), o3_ci = c(0.22710620006376, 0.283390020715785, 0.279702181925963, 0.754017640698111, 0.376479324970397)), class = c("tbl_df", "tbl", "data.frame"), .Names = c("id", "co_ci", "no_ci", "no2_ci", "o3_ci"), row.names = c(NA, -5L))
## convert each df to long-format:
df.mean.long <- melt(df.mean)
df.ci.long <- melt(df.ci)
## join two long dfs back together for plotting:
df.long.join <- full_join(df.mean.long, df.ci.long, by="id")
## generate confidence intervals relative to each mean:
limits <- aes(ymax = value.x + value.y, ymin = value.x-value.y) ## this is likely the problem!
## create our barplot:
barplot <- ggplot(df.long.join, aes(x=id, y=value.x, fill = variable.x)) +
geom_bar(position="dodge", stat="identity") +
geom_errorbar(limits, position = "dodge", width = 0.25)
barplot
Thank you in advance!
Your join is adding extra rows, and therefore extra error bars, because there are four matching copies for each level of
id
in each data frame. The error bars also are not dodged by the same amount as the bars.The code below shapes the data so as to get the desired join and also uses faceting to avoid the need for a legend. You can switch the x variable and faceting variable, depending on which comparisons you want to highlight.
To shape the data, the goal is to join on both
id
andpollutant
, so we need to get each data frame in long format and get common pollutant names in each data frame.We first put
df.mean
in long format usinggather
(atidyr
function that is essentially the equivalent ofmelt
from thereshape2
package).separate
is there to give us a column with just the pollutant abbreviation, without_mean
appended. Then we get rid of the unneededmean
column that created withseparate
(although we don't have to do this).Now we do the same thing to
df.ci
, but we also change the name of thevalue
column toci
so that it will be different from thevalue
column we created indf.mean
.The
left_join
combines the two reshaped data frames into a single data frame ready for plotting.