I have two datasets in which at several time points a certain value is measured. I would like to visualize the ratio between the two datasets with a line chart. However, the time points of the measurements do not overlap completely and some experiments contain more values than others.

Data

df_1 <- as.data.frame(cbind(c(1,2,3,4,5,6,7,8,9,10), c(1,4,7,8,9.5,17,41,27, 26, 10)))
df_2 <- as.data.frame(cbind(c(0.1, 0.5, 2, 3, 4.4,5.3,6,7,8,10,12,15,16,20), c(0.1,2,4,6,7.5,6.3,9,19,22,23,24,51,31,23)))

Plots

library(tidyverse)

ggplot()+
   geom_line(data = df_1, aes(x=V1, y=V2), col = "black") + 
   geom_line(data = df_2, aes(x=V1, y=V2), col = "red")

Next, I would like to plot the ratio between the two curves. However, since the measured data-points are not evenly spaced (and measured at slightly different times), I am not able to simply divide the value for each time point and plot the result.

Question

How do you calculate the ratio between two unevenly spaced dataframes (which can be of different lengths) in R?

Additional example

I have two (example) datasets. The third dataset is the expected output (df_3). The green line represents the expected output: the ratio between df_2/df_1.

df_1 <- as.data.frame(cbind(time = c(1:5), value = c(1:5)))
df_2 <- as.data.frame(cbind(time = c(1.5, 2, 3,4), value = c(2,2,2.5,3.5)))
df_3 <- as.data.frame(cbind(time = c(1.5, 2, 3,4), value = c(1.33, 1, 0.83, 0.875)))

ggplot() + 
  geom_point(data = df_1, aes(x=time, y=value), col = "black", size = 3) + 
  geom_point(data = df_2, aes(x=time, y=value), col = "red", size =3) +
  geom_line(data = df_1, aes(x=time, y = value), col = "black") + 
  geom_line(data = df_2, aes(x = time, y = value), col = "red") + 
  geom_point(data =df_3, aes(x = time, y = value), col = "green", size = 3) +
  geom_line(data=df_3, aes(x=time, y = value), col = "green")

1 Answers

1
Cettt On Best Solutions

You can do a full join using the dplyr package. Note that the data frames I am using are the same as yours but the columns are named time and value.

Linear interpolation on the ratio

df_1 <- as.data.frame(cbind(time = c(1,2,3,4,5,6,7,8,9,10), value = c(1,4,7,8,9.5,17,41,27, 26, 10)))
df_2 <- as.data.frame(cbind(time = c(0.1, 0.5,2,3,4.4,5.3,6,7,8,10,12,15,16,20), value = c(0.1,2,4,6,7.5,6.3,9,19,22,23,24,51,31,23)))

library(dplyr)
df_1 %>% full_join(df_2, by = "time", suffix = c("_1", "_2")) %>%
  arrange(time) %>% 
  mutate(ratio = value_1/value_2,
         ratio = approx(time, ratio, xout = time, rule = 2:2)$y)

   time value_1 value_2     ratio
1   0.1      NA     0.1 1.0000000
2   0.5      NA     2.0 1.0000000
3   1.0     1.0      NA 1.0000000
4   2.0     4.0     4.0 1.0000000
5   3.0     7.0     6.0 1.1666667
6   4.0     8.0      NA 1.4074074
7   4.4      NA     7.5 1.5037037
8   5.0     9.5      NA 1.6481481
9   5.3      NA     6.3 1.7203704
10  6.0    17.0     9.0 1.8888889
11  7.0    41.0    19.0 2.1578947
12  8.0    27.0    22.0 1.2272727
13  9.0    26.0      NA 0.8310277
14 10.0    10.0    23.0 0.4347826
15 12.0      NA    24.0 0.4347826
16 15.0      NA    51.0 0.4347826
17 16.0      NA    31.0 0.4347826
18 20.0      NA    23.0 0.4347826

Linear interpolation of value_1 on df_2

# Interpolated dataframe 1
# Interpolation ensures that for every time point in df_2, a value_1 is calculated. 
# Next, the ratio of value_2 / value_1 is calculated. 

int_df_1 <- as.data.frame(approx(df_1$time, df_1$value, xout = df_2$time, rule = 1:1))
names(int_df_1) <- c("time", "value")

# Again full join + division of df_2 by interpolated df_1

int_df_1 %>% 
   full_join(df_2, by = "time", suffix = c("_1", "_2")) %>%
   arrange(time) %>% 
   mutate(ratio = value_2/value_1)