I need to calculate the difference between non-consecutive records of a variable, grouped by another. That is, I want to take the last value of the variable in a run and subtract it from the first value in the next run (if there is any).
I know I can use rleid
along with shift
to calculate differences in consecutive rows, but this time I need to get rid of those.
Example data
dput(iris)
structure(list(Sepal.Length = c(4.4, 6.3, 4.6, 5.8, 6.4, 6.5,
4.9, 5.4, 6.4, 6.7), Sepal.Width = c(3, 2.8, 3.1, 2.7, 2.7, 3,
3.6, 3.9, 2.8, 3.1), Petal.Length = c(1.3, 5.1, 1.5, 4.1, 5.3,
5.5, 1.4, 1.7, 5.6, 4.7), Petal.Width = c(0.2, 1.5, 0.2, 1, 1.9,
1.8, 0.1, 0.4, 2.1, 1.5), Species = c("setosa", "virginica",
"setosa", "versicolor", "virginica", "virginica", "setosa", "setosa",
"virginica", "versicolor")), .Names = c("Sepal.Length", "Sepal.Width",
"Petal.Length", "Petal.Width", "Species"), row.names = c(NA,
-10L), class = c("data.table", "data.frame"))
library(data.table)
setDT(iris, key = "Sepal.Width")
I thnik something like
iris[, diff(Petal.Width), by = .(Species, !rleid(Species))]
(of course this doesn't work!) is what I need, but can't think of anything to achieve it.
Expected result (diff
ing Petal.Width
):
Species V1
1: versicolor 0.5
2: virginica -0.3
3: setosa 0.0
4: setosa -0.1
(I achieved it doing iris[, diff(Petal.Width), by = .(Species)]
and then hand-picking .Last.Value[, c(1, 4, 5, 6)]
)
Well, there's
Or...