Start with this code:
set.seed(0)
the_df <- tibble(date=seq.Date(ymd('20230101'),ymd('20230101')+days(9), by='days'),
lead1=rep(1:10),
lead2=runif(10),
lead3=runif(10),
lead4=runif(10))
the_df %>%
mutate(lag2=lead2-dplyr::lag(lead1,1),
lag3=lead3-dplyr::lag(lead2,1),
lag4=lead4-dplyr::lag(lead3,1),)
The output will look like:
Now imagine a tibble with hundreds of columns. How can this lagged difference from one column the next be replicated in vector form for all columns?
I.e., if the tibble has lead1 to lead 100, the result would be lag2 to lag100.
The statement
the_df %>%
mutate(across(lead1:lead4, ~ dplyr::lag(.x,1), .names="d_{.col}"))
calculates the lag of each column within the column and creates a new column of data with that result. How do I calculate the difference between two columns within the across function as in the simple four-column example but for all columns?
Update: a one-liner:
Original:
Make the data long, then get the name of the previous column::
Left join this intermediate df with itself, getting the previous column's value as a new column. Then calculate the value minus the previous column's lagged value. Then remove the superfluous columns
To make the data wide again:
Output: