How to remove one-off increases in value

87 views Asked by At

I'm working with time series data on a variable that generally increases slowly over time. Very simplified example:

df <- data.frame(index=1:8, value = c(rep(0.25, 3),1.95,0.25,rep(0.5,3)))

 index value
     1  0.25
     2  0.25
     3  0.25
     4  1.95
     5  0.25
     6  0.50
     7  0.50
     8  0.50

A recurring feature of the dataset is what happens at index 4: the value spikes upward then immediately comes back down again. I want to remove those values. (There are also points in my dataset where the value makes a small increase followed by a small decrease some time later, but I want to keep those.)

I have found a way of removing the values, by using diff to calculate the change from the previous value, then turning the data frame upside down, using diff again to calculate the change from the next value and removing rows where the two diffs are the same, but that seems like the least efficient process ever:

library(dplyr)

df %>% 
  mutate(diffprev = diff(value) %>% c(0, .)) %>% 
  arrange(desc(variable)) %>% 
  mutate(diffnext = diff(value) %>% c(0, .)) %>% 
  filter(diffprev == 0 | diffprev != diffnext)

I realise that if the spike in value happened at index 5 rather than 4 this wouldn't work but in the full dataset this is so unlikely that unless there's a simple fix I'm not going to worry about it. But what would be a better way of going about this?

2

There are 2 answers

2
jeremycg On BEST ANSWER

You could try:

df %>% filter(lag(value) != lead(value) | (value - lag(value)) %in% c(0, NA))

You might also be interested in the lag and lead functions from dplyr.

Edit: thanks @Frank for a couple modifications

0
Frank On

You don't need to rearrange. The first diff column you make contains all the info you need:

df %>% 
  mutate(diffprev = diff(value) %>% c(0, .)) %>%
  filter(diffprev == 0 | diffprev != -lead(diffprev) ) %>%
  select(-diffprev)

which gives

  variable value
1        1  0.25
2        2  0.25
3        3  0.25
4        5  0.25
5        6  0.50
6        7  0.50
7        8  0.50