I have a data set of items downloaded from a website based on reports we generate. The idea is to remove reports that are no longer needed based on the number of downloads. The logic is basically count all the reports for the last year that have been downloaded, check if they are outside of two absolute deviations around the median for the current year, check if the report has been downloaded within the last 4 weeks and if so how many times
I have the code below which doesn't work, I was wondering if anyone can help It gives me the error: for the n_recent_downloads section
Error in FUN(X[[1L]], ...) : only defined on a data frame with all numeric variables
reports <- c("Report_A","Report_B","Report_C","Report_D","Report_A","Report_A","Report_A","Report_D","Report_D","Report_D")
Week_no <- c(36,36,33,32,20,18,36,30,29,27)
New.Downloads <- data.frame (Report1 = reports, DL.Week = Week_no)
test <- New.Downloads %>%
group_by(report1) %>%
summarise(n_downloads = n(),
n_recent_downloads = ifelse(sum((as.integer(DL.Week) >= (as.integer(max(DL.Week))) - 4),value,0)))
Providing a reproducible example would make life a lot easier. Nonetheless I have modified your code to do what I think you were trying to achieve.
I've split it into two so you can see what is going on. I moved the
ifelse
statement to amutate
call which gives:Note that from your example none of the values are classed as extreme relative to the
median + 2 * mad
criterion, so thecheck
values are identical toDL.week
.You can then chain a
summarise
onto the end of this to give you the sums.