Each day I run summary report to get a list of the files that were imported and used in a certain project output. Each day's log is stamped with current day's date (Log.Date) and they are read into a running file log.
I want to create a check for this log that will let me know that the same number of historical files are being read in from one day to the next. (i.e. today's imports should equal yesterday's imports, plus a few new files from today.)
Example log:
fileLog <- data.frame('Log.Date'=c('2020-08-01','2020-08-01','2020-08-02','2020-08-02','2020-08-02','2020-08-03','2020-08-03','2020-08-03','2020-08-03'),
'System' = c('A','B','A','B','C','A','B','C','D'),
'File'=c('file1','file2','file1','file2','file3', 'file1', 'file2','file3','file4'))
# Log.Date System File
# 1 2020-08-01 A file1
# 2 2020-08-01 B file2
# 3 2020-08-02 A file1
# 4 2020-08-02 B file2
# 5 2020-08-02 C file3
# 6 2020-08-03 A file1
# 7 2020-08-03 B file2
# 8 2020-08-03 C file3
# 9 2020-08-03 D file4
I group the file log by the Log.Date and System to get a file count for each day, then pivot the data so that I can see and calculate any change from the previous day. I want to create the calculation by position: Change = 'last column' minus 'next to last column', since the column names will always be changing.
fileLog <- fileLog %>%
arrange(Log.Date) %>%
group_by(Log.Date, System) %>%
summarise(File.Count = length(unique(File))) %>%
ungroup() %>%
pivot_wider(names_from = Log.Date, values_from = File.Count) %>%
replace(is.na(.), 0)
fileLog <- as.data.frame(fileLog) %>%
mutate(Change = rev(fileLog)[1] - rev(fileLog)[2])
This does let me view the change I'm looking for, however the resulting 'Change' column is of class 'data.frame'. This causes me problems for exporting it as part of a report.
# System 2020-08-01 2020-08-02 2020-08-03 Change.2020-08-03
# 1 A 1 1 1 0
# 2 B 1 1 1 0
# 3 C 0 1 1 0
# 4 D 0 0 1 1
> class(fileLog$Change)
[1] "data.frame"
Using the absolute column names works fine:
mutate(Change = fileLog$'2020-08-03' - fileLog$'2020-08-02')
I've tried other iterations of calling the position, which nets the same problem
mutate(Change = fileLog[,ncol(fileLog)] - fileLog[,ncol(fileLog)-1])
I've also tried coercing the column to numeric, gives the error: 'list' object cannot be coerced to type 'double'
mutate(Change = as.numeric(check_start[,-1]) - as.numeric(check_start[,-2]))
mutate(Change = as.numeric(rev(check_start)[1]) - as.numeric(rev(check_start)[2]))
Question: Can someone suggest the correct way to make this positional calculation to output a 'Change' column that is not of class=data.frame?
Few changes to your previous step :
Instead of
length(unique(File))
, usedn_distinct
Instead of adding another
replace
step afterpivot_wider
usevalues_fill
.Now to subtract value from last and second last column you can use
ncol
which will give you column number of last column.