Lagged difference between rows in R_ a different take

1.6k views Asked by At

My question is similar to a few that have been asked before, but I hope different enough to warrant a separate question.

See here, and here. I'll pull some of the same example data as these questions. For context to my question- I am looking to see how my observed catch-rate (sea creatures) changed over multiple days of sampling the same area.

I want to calculate the difference between the first sample day at a given site (first Letter in data below), and the subsequent sample days (next rows of same letter).

 #Example data   
 df <- data.frame(
 id = c("A", "A", "A", "A", "B", "B", "B"), 
 num = c(1, 8, 6, 3, 7, 7 , 9),
 What_I_Want = c(NA, 7, 5, 2, NA, 0, 2))

The first solution that I found calculates a lagged difference between each row. I also wanted this calculation- so it was helpful to find:

#Calculate lagged differences
df_new <- df %>% 
# group by condition
group_by(id) %>% 
# find difference
mutate(diff = num - lag(num))

Here the difference is between A.1 and A.2; then A.2 and A.3 etc...

What I would like to do now is calculate the difference with respect to the first value of each group. So for letter A, I would like to calculate 1 - 8, then 1 - 6, and finally 1 - 3. Any suggestions?

One clunky solution (linked above) is to create two (or more) columns for each distance lagged and some how merge the results that I want

df_clunky = df %>%
group_by(id) %>%
mutate(
deltaLag1 = num - lag(num, 1),
deltaLag2 = num - lag(num, 2))
1

There are 1 answers

0
lmo On BEST ANSWER

Here is a base R method with replace and ave

ave(df$num , df$id, FUN=function(x) replace(x - x[1], 1, NA))
[1] NA  7  5  2 NA  0  2

ave applies the replace function to each id. replace takes the difference of the vector and the first element in the vector as its input and replaces NA into the first element.