Overlay line plots in ggplot2

17.2k views Asked by At

I've created a multiple line graph using ggplot2, where each line represents a year that is plotted against month (click link below). Volume is represented on the y-axis.

http://imgur.com/3Rwdjyi

Here is the code I used to plot the figure above:

ggplot(data=df26, aes(x=Month, y=C1, group=Year, colour=factor(Year))) + 
    geom_line(size=.75) + geom_point() +    
    scale_x_discrete(limits=c("Jan","Feb","Mar","Apr","May","Jun","Jul",
        "Aug","Sep","Oct","Nov","Dec")) + 
    scale_y_continuous(labels=comma) + 
    scale_colour_manual(values=cPalette, name="Year") +    
    ylab("Volume")   

Question: How do I also include another line to the plot that represents the mean volume within each month with the ability to modify the line thickness and color of that mean line? So far, all of my attempts at producing the right code have been unsuccessful (most likely due to my relative newbie status using R). Any help is much appreciated!

Edit: Dataframe df26 is provided below (as requested by a commenter):

Year  Month   C1
2010  Jan NA
2010  Feb NA
2010  Mar NA
2010  Apr NA
2010  May NA
2010  Jun NA
2010  Jul NA
2010  Aug 183.6516764
2010  Sep 120.6303348
2010  Oct 85.31007613
2010  Nov 13.7347988
2010  Dec 20.93950545
2011  Jan 13.35780833
2011  Feb 14.16910945
2011  Mar 9.786319721
2011  Apr 41.24848885
2011  May 122.3014387
2011  Jun 422.4012809
2011  Jul 539.8569592
2011  Aug 527.6301222
2011  Sep 385.8199781
2011  Oct 201.7846973
2011  Nov 27.91934061
2011  Dec 7.919004379
2012  Jan 10.22724424
2012  Feb 10.64391791
2012  Mar 88.06585438
2012  Apr 124.0320675
2012  May 325.1399457
2012  Jun 465.938168
2012  Jul 567.2273488
2012  Aug 459.769634
2012  Sep 333.8636373
2012  Oct 102.0607986
2012  Nov 23.18822051
2012  Dec 15.64841121
2013  Jan 7.458238256
2013  Feb 4.34972039
2013  Mar 26.2019396
2013  Apr 38.82781323
2013  May 257.0920645
2013  Jun 357.594195
2013  Jul 383.2780483
2013  Aug 456.469314
2013  Sep 319.3616298
2013  Oct NA
2013  Nov NA
2013  Dec 17.01748185
1

There are 1 answers

5
Gregor Thomas On BEST ANSWER

You need to calculate the means. Then you can plot them. Using dplyr

library(dplyr)
df26means <- df26 %>%
    group_by(Month) %>%
    summarize(C1 = mean(C1, na.rm = T))

Then add it to your plot:

ggplot(data=df26, aes(x=Month, y=C1, group=Year, colour=factor(Year))) + 
    geom_line(size=.75) + geom_point() + 
    scale_x_discrete(limits=c("Jan", "Feb", "Mar", "Apr", "May", "Jun",
                              "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")) + 
    scale_y_continuous(labels=comma) + 
    scale_colour_manual(values=cPalette, name="Year") + 
    ylab("Volume") +
    geom_line(data = df26means, aes(group = 1), size = 1.25, color = "black")

I'd recommend using annotate to add a nice piece of text on the plot identifying that line as the mean line. To get it in the legend, you'd probably need to set df26means$Year = "Mean", convert df26$Year to a character, rbind the two dataframes together, then convert Year to a factor. The plot code would be simpler, but the data wrangling is a bit more complicated.