R calculation of new column value from other columns, using relative positions. Why does the class of the resulting column = data.frame?

77 views Asked by At

Each day I run summary report to get a list of the files that were imported and used in a certain project output. Each day's log is stamped with current day's date (Log.Date) and they are read into a running file log.

I want to create a check for this log that will let me know that the same number of historical files are being read in from one day to the next. (i.e. today's imports should equal yesterday's imports, plus a few new files from today.)

Example log:

fileLog <- data.frame('Log.Date'=c('2020-08-01','2020-08-01','2020-08-02','2020-08-02','2020-08-02','2020-08-03','2020-08-03','2020-08-03','2020-08-03'),
                     'System' = c('A','B','A','B','C','A','B','C','D'),
                     'File'=c('file1','file2','file1','file2','file3', 'file1', 'file2','file3','file4')) 

#     Log.Date     System   File
# 1   2020-08-01      A     file1
# 2   2020-08-01      B     file2
# 3   2020-08-02      A     file1
# 4   2020-08-02      B     file2
# 5   2020-08-02      C     file3
# 6   2020-08-03      A     file1
# 7   2020-08-03      B     file2
# 8   2020-08-03      C     file3
# 9   2020-08-03      D     file4

I group the file log by the Log.Date and System to get a file count for each day, then pivot the data so that I can see and calculate any change from the previous day. I want to create the calculation by position: Change = 'last column' minus 'next to last column', since the column names will always be changing.

fileLog <- fileLog %>% 
  arrange(Log.Date) %>% 
  group_by(Log.Date, System) %>% 
  summarise(File.Count = length(unique(File))) %>% 
  ungroup() %>% 
  pivot_wider(names_from = Log.Date, values_from = File.Count) %>% 
  replace(is.na(.), 0)
fileLog <- as.data.frame(fileLog) %>% 
  mutate(Change = rev(fileLog)[1] - rev(fileLog)[2]) 

This does let me view the change I'm looking for, however the resulting 'Change' column is of class 'data.frame'. This causes me problems for exporting it as part of a report.

#        System  2020-08-01  2020-08-02  2020-08-03  Change.2020-08-03
# 1        A          1          1          1          0
# 2        B          1          1          1          0
# 3        C          0          1          1          0
# 4        D          0          0          1          1

> class(fileLog$Change)
[1] "data.frame"

Using the absolute column names works fine:

mutate(Change = fileLog$'2020-08-03' - fileLog$'2020-08-02')

I've tried other iterations of calling the position, which nets the same problem

mutate(Change = fileLog[,ncol(fileLog)] - fileLog[,ncol(fileLog)-1])

I've also tried coercing the column to numeric, gives the error: 'list' object cannot be coerced to type 'double'

  mutate(Change = as.numeric(check_start[,-1]) - as.numeric(check_start[,-2]))
  mutate(Change = as.numeric(rev(check_start)[1]) - as.numeric(rev(check_start)[2])) 

Question: Can someone suggest the correct way to make this positional calculation to output a 'Change' column that is not of class=data.frame?

1

There are 1 answers

3
Ronak Shah On BEST ANSWER

Few changes to your previous step :

  1. Instead of length(unique(File)), used n_distinct

  2. Instead of adding another replace step after pivot_wider use values_fill.

fileLog <- fileLog %>% 
  arrange(Log.Date) %>% 
  group_by(Log.Date, System) %>% 
  summarise(File.Count = n_distinct(File)) %>% 
  ungroup() %>% 
  pivot_wider(names_from = Log.Date, values_from = File.Count, values_fill = 0) 

Now to subtract value from last and second last column you can use ncol which will give you column number of last column.

fileLog <- fileLog %>% mutate(Change = .[[ncol(.)]] - .[[ncol(.) - 1]]) 
fileLog
# A tibble: 4 x 5
  System `2020-08-01` `2020-08-02` `2020-08-03` Change
  <chr>         <int>        <int>        <int>  <int>
1 A                 1            1            1      0
2 B                 1            1            1      0
3 C                 0            1            1      0
4 D                 0            0            1      1