How to add "slope" function (excel) in R

71 views Asked by At

I'm trying to move data wrangling from excel to R. I have a dataframe (df) with 62 columns and 21 rows. Each of the rows represents a different organisation, and each of the column headings (apart from the first one which gives organisation name) are monthly dates.

The structure looks like this:

structure(list(OrgCode = c("RA", "RB", "RC", "RD"), Provider = c("ORGA", 
"ORGB", "ORGC", 
"ORGD"), `Jan-22` = c(0.54, 
0.78, 0.715399610136452, 0.37), `Feb-22` = c(0.14, 
0.43, 0.65, 0.87)), row.names = c(NA, 
-4L), class = c("tbl_df", "tbl", "data.frame"))

My excel spreadsheet uses the slope() function to take the 3 most recent dates and calculate the slope of them. I want to create a new column in my df in R which contains the slope of the 3 last columns of my df but I can't figure out how to do this.

I've seen some code suggesting taking a linear regression for each provider:

mod <- lm(y ~ x)
cf <- coef(mod)
Slope <- cf[2]

but I don't know how to save a model (which only uses the last three dates not all the dates available in the dataframe) in a column and then create another column which takes the date of this - seems like there should be a simple function which exists.

Any help greatly appreciated.

2

There are 2 answers

0
Grzegorz Sapijaszko On

A lazy approach.. slope function takes 3 last columns - you must make sure that's the proper one with last dates.

df <- structure(
  list(
    OrgCode = c("RA", "RB", "RC", "RD"),
    Provider = c("ORGA", "ORGB", "ORGC", "ORGD"),
    `Jan-22` = c(0.54, 0.78, 0.715399610136452, 0.37),
    `Feb-22` = c(0.14, 0.43, 0.65, 0.87),
    `Mar-22` = c(0.54, 0.78, 0.715399610136452, 0.37),
    `Apr-22` = c(0.84, 1.43, 0.9, 0.07)
  ),
  row.names = c(NA, -4L),
  class = c("tbl_df", "tbl", "data.frame")
)

slope <- function(df = "", row_number = 1L) {
  df <- df[,c((ncol(df)-2):ncol(df))]
  sl <- lm(c(df[row_number, 1][[1]], df[row_number, 2][[1]], df[row_number, 3][[1]]) ~ c(1, 2, 3)) |>
    coef() |>
    purrr::pluck(2)
  return(sl)
}

slope(df, 1)
#> [1] 0.35

slope(df, 3)
#> [1] 0.125

sla <- list()
for (i in 1:nrow(df)) {
  sla <- slope(df, i) |>
    rbind(sla)
}
df$slope <- rev(unlist(sla))
df
#>   OrgCode Provider    Jan-22 Feb-22    Mar-22 Apr-22  slope
#> 1      RA     ORGA 0.5400000   0.14 0.5400000   0.84  0.350
#> 2      RB     ORGB 0.7800000   0.43 0.7800000   1.43  0.500
#> 3      RC     ORGC 0.7153996   0.65 0.7153996   0.90  0.125
#> 4      RD     ORGD 0.3700000   0.87 0.3700000   0.07 -0.400

Created on 2024-02-29 with reprex v2.1.0

0
G. Grothendieck On

We can define slope as shown below.

For example if we want to calculate the slope separately on each row for the last n=2 columns:

library(dplyr)

slope <- function(y, x = seq_along(y)) cov(x, y) / var(x)

n <- 2  # take slope of last n columns on each row
DF %>% 
  rowwise %>%
  mutate(slope = slope(c_across(seq(to = last_col(), length = n)))) %>%
  ungroup

giving

# A tibble: 4 × 5
  OrgCode Provider `Jan-22` `Feb-22`   slope
  <chr>   <chr>       <dbl>    <dbl>   <dbl>
1 RA      ORGA        0.54      0.14 -0.4   
2 RB      ORGB        0.78      0.43 -0.35  
3 RC      ORGC        0.715     0.65 -0.0654
4 RD      ORGD        0.37      0.87  0.5   

Note

The input DF from the question is

DF <- 
structure(list(OrgCode = c("RA", "RB", "RC", "RD"), Provider = c("ORGA", 
"ORGB", "ORGC", 
"ORGD"), `Jan-22` = c(0.54, 
0.78, 0.715399610136452, 0.37), `Feb-22` = c(0.14, 
0.43, 0.65, 0.87)), row.names = c(NA, 
-4L), class = c("tbl_df", "tbl", "data.frame"))