Problems plotting multiple functions and data points together using ggplot2 in r

292 views Asked by At

I am trying to plot 5 functions and 5 data points on the same plot using the ggplot2 package. The code works when I am just plotting the functions, but as soon as I add the data points, the it takes very long time to process. If I only add one point,- it takes maybe around 10 min to plot, and as soon as I add more than 3 points r just freezes.

A reproducible example is shown below (sorry for the lengthy code, but in my case I need to integrate piecewise functions, and I expect that may be part of the reason why it takes so long):

rm(list=ls())
library(ggplot2)
library(reshape2)
library(mosaic)

#Input x-values
T1 <- 3*24*3600
T2 <- 5*24*3600
T3 <- 15*24*3600
T4 <- 61*24*3600

#Input functions
V1=makeFun(75*exp(-x/50000)~x) 
V2=makeFun(1000*exp(-x/60000)~x) 
V3=makeFun(100*exp(-x/275000)~x)
V4=makeFun(125*exp(-(x-1300000)/800000)~x) 

f1=makeFun(V1(x)+V2(x)+V3(x)~x)
f2=makeFun(V2(x)+V3(x)~x)
f3=makeFun(V3(x)~x)
f4=makeFun(V4(x)~x)

#4 piecewise functions 
v <-  function(x) 
  (integrate(function(x)
    (x > 0 & x <= T1)*f1(x)/1000000000 + (x > T1 & x <= T2)*f2(x)/1000000000 + (x > T2 & x <= T3)*f3(x)/1000000000+ (x > T3 & x <= T4)*f4(x)/1000000000
    , lower=0, upper=x)$value)
Vo0<- Vectorize(v)

v_w <-  function(x) 
  (integrate(function(x)
    (x >= 0 & x <= T1)*V1(x)/1000000000
    , lower=0, upper=x)$value)
Vo1<- Vectorize(v_w)

v_s <-  function(x) 
  (integrate(function(x)
    (x >= 0 & x <= T2)*V2(x)/1000000000 
    , lower=0, upper=x)$value)
Vo2 <- Vectorize(v_s)

v_n1 <-  function(x) 
  (integrate(function(x)
    (x >= 0 & x <= T3)*V3(x)/1000000000 
    , lower=0, upper=x)$value)
Vo3 <- Vectorize(v_n1)

v_l <-  function(x) 
  (integrate(function(x)
    (x > T3 & x <= T4)*V4(x)/1000000000
    , lower=0, upper=x)$value)
Vo4 <- Vectorize(v_l)

#Point1 
x1<- 61*24*3600
y1 <- 0.205139861

# Point2
x2 <- 3*24*3600
y2 <- 0.004566857 

#Point3
x3 <- 5*24*3600
y3 <- 0.062331177 

#Point4
x4 <- 15*24*3600
y4<- 0.031999923

#Point5
x5 <- 46*24*3600
y5 <- 0.10585637 

#Input values needed to make the ggplot
x <-(0:T4)
d <- 3600*24

p2 <- ggplot(data.frame(x = c(0:T4)), aes(x = x)) +
  stat_function(fun=Vo0, size=1.5)+
  stat_function(fun=Vo1, color= "blue")+
  stat_function(fun=Vo2, color= "red")+
  stat_function(fun=Vo3, color= "green")+
  stat_function(fun=Vo4, color= "orange")+
  scale_x_continuous(sec.axis = sec_axis(~./(d), name="Days [d]"))+
  labs(x=expression("Seconds [s]"))+
  labs(y=expression("Volume [km3]")) +
  theme_bw(base_family = "Times", base_size = 18)  +
  theme(plot.title = element_text(hjust=0.5))
print(p2)

p3 <- p2 + geom_point(aes(x1,y1), size=3, color="black") 
#     geom_point(aes(x2,y2), size=1.5, color="blue")+
#    geom_point(aes(x3,y3), size=1.5, color="red")+
#   geom_point(aes(x4,y4), size=1.5, color="green")+
#  geom_point(aes(x5,v5), size=1.5, color="orange")

print(p3)

Printing p2 does not take too long, but when you try to print p3 it takes 10 min on my computer (OS X 10.9.5), and if I remove the # from the last 4 points r just freezes.

So my question is basically: How can I rewrite the code that makes it possible to plot functions (similar to mine) and data points (multiple) in the same plot using ggplot2 ? Any advice would be highly appreciated since I have been stuck with this for a week now. Thanks for your help!

1

There are 1 answers

0
Brian On

Put your points in a dataframe. If you try to call the points directly from the global environment, as you are, it's going to be plotting them over and over. For example, it's trying to plot a point at x1=61*24*3600 and y1=0.205139861 once for every row of the original data frame you called ggplot with, which is 61*24*3600 rows. No wonder it takes forever.

df.points <- data.frame(xs = c(x1, x2, x3, x4, x5), 
                        ys = c(y1, y2, y3, y4, y5), 
                        ids = paste0("Vo", 0:4))

p2 + geom_point(data = df.points, aes(xs, ys, color = ids), size=3) +
  scale_color_manual(values = c("black", "blue", "red", "green", "orange"))

While you're at it, there's no reason to have your initial dataframe be 61*24*3600 rows. stat_function chooses a sequence of smooth points to evaluate the function at, so as long as it knows the range of your desired x values, it doesn't require that kind of resolution. Replace your first ggplot call with:

p2 <- ggplot(data.frame(x = seq(0, T4, length.out = 100)), aes(x = x)) + ...

Which will substantially improve the speed.

enter image description here

With those changes this printed in less than 10 seconds.