I wrote a function which analyses if there is a methane concentration greater than 2.5 ppm; if yes, does a linear regression and check if this regression is significant (pvalue < 0.05)
Then I have large dataset, of several wells. So I group the data by well and want to apply the function for each well (i.e. for Well_A, do a regression of methane vs time, if it is significant, return TRUE). Everything works well although I keep having a warning message that cur_data is deprecated. It still works, but I'd like a correct code without it, in case I use it for a long time and it not longer works at some point. The issue is that whatever I do I cannot get the correct code. I tried using the dot, which make the code work but does not seem to loop over each well.
Here is the code, I created 4 wells, only one of them (Well_D) has a significant slope. You can see that using cur_data works, but then if I use the dot, every well has significant slope. I tried other solutions and none worked.
library(tibble)
library(dplyr)
###Data
df <- tibble(
WELL_NAME = rep(c("Well_A", "Well_B", "Well_C", "Well_D"), each = 5),
FIELD_TIME = rep(c(0, 2, 4, 6, 8), times = 4),
CH4_PPM = c(2.0, 2.1, 2.0, 2.2, 2.1, # Données pour Well_A
2.0, 2.2, 2.4, 2.5, 2.3, # Données pour Well_B
1.8, 1.9, 1.8, 1.9, 1.8, # Données pour Well_C
1.5, 1.7, 2.2, 2.8, 3.3) # Données pour Well_D
)
### Function
check_methane_slope_significance <- function(x) {
if (any(x$CH4_PPM > 2.5)) {
lm_result <- lm(CH4_PPM ~ FIELD_TIME, data = x)
p_value <- summary(lm_result)$coefficients[2, 4]
slope <- coef(lm_result)[["FIELD_TIME"]]
if (p_value < 0.05 && slope > 0) {
return(TRUE)
} else {
return(FALSE)
}
} else {
return(FALSE)
}
}
##### CODE WITH CUR_DATA WHICH WORKS####
significant_slopes <- df %>%
group_by(WELL_NAME) %>%
summarise(has_significant_slope = any(check_methane_slope_significance(cur_data())), .groups = "drop")
significant_slopes <- significant_slopes %>%
filter(has_significant_slope)
print(significant_slopes)
###### CODE WITHOUT CUR_DATA, RETURNS ONLY TRUE
significant_slopes <- df %>%
group_by(WELL_NAME) %>%
summarise(has_significant_slope = any(check_methane_slope_significance(.)), .groups = "drop")
significant_slopes <- significant_slopes %>%
filter(has_significant_slope)
print(significant_slopes)