Conditional subtraction Part 2

114 views Asked by At

I have a big data.frame (TOTAL) with some values (cols11-16) that I need to subtract a base from which is multiplied by a value depending on two conditions in TOTAL.

The data.frame (TOTAL) looks a little like this

Channel    Hour      Category cols11 cols12 cols13 cols14 cols15 base
TV1        01:00:00  New      2      5      4      5      6      2.4
TV5        23:00:00  Old      1      5      3      9      7      1.8
TV1        02:00:00  New      8      7      9      2      4      5.4

There are 4 different channels, and 24 different hours (00:00:00-23:00:00)

I have four other vectors with the conditionedvariable that needs to be multiplied on the base depending on the Hour and the Channel, So for each channel I have a vector like this:

TV1Slope:
TV1Slope00 TV1Slope01 TV1Slope02.. TV1Slope23
 0.0012      0.0015    0.013       0.0009

TV5Slope:
TV5Slope00 TV5Slope01 TV5Slope02.. TV5Slope23
0.0032      0.0023    0.016       0.002

TOTAL$Uplift0 <- (TOTAL$cols11 - TOTAL$base * conditionedvariable)
TOTAL$Uplift1 <- (TOTAL$cols12 - TOTAL$base * conditionedvariable)
TOTAL$Uplift2 <- (TOTAL$cols13 - TOTAL$base * conditionedvariable)
TOTAL$Uplift3 <- (TOTAL$cols14 - TOTAL$base * conditionedvariable)
TOTAL$Uplift4 <- (TOTAL$cols15 - TOTAL$base * conditionedvariable)

How do I make R choose the conditioned variable based on the conditions?

For example:

For TOTAL$Uplift0 I will get:

 cols11 - base * conditionedvariable

For the first row where Channel is TV1 and hour is 01:00:00: 2 - 2.4 *0.0015 For the second row where Channel is TV5 and hour is 23:00:00: 1 - 1.8 *0.002 For the third row where Channel is TV1 and hour is 02:00:00: 8 - 5.4 *0.013

1

There are 1 answers

0
akrun On BEST ANSWER

We paste the 'Channel' and substring of 'Hour' column together ('nm1'), concatenate the 'TV1Slope' and 'TV5Slope' vectors ('TV15'), match the 'nm1 vector with names of 'TV15' after removing the 'Slope' substring with sub, and get the corresponding 'TV15' value. Subset the columns with names starting with 'cols' using grep, do the calculation, and assign it to new columns ('nm2').

nm1 <- with(TOTAL, paste0(Channel, substr(Hour, 1,2)))
TV15 <- c(TV1Slope, TV5Slope)
val <- TV15[match(nm1, sub('Slope', '', names(TV15)))]
indx <- grep('^cols', names(TOTAL))
nm2 <- paste0('Uplift',seq_along(indx)-1)
TOTAL[nm2] <- TOTAL[indx]-(TOTAL$base*val)
TOTAL
#  Channel     Hour Category cols11 cols12 cols13 cols14 cols15 base   Uplift0
#1     TV1 01:00:00      New      2      5      4      5      6  2.4 1.9946026
#2     TV5 23:00:00      Old      1      5      3      9      7  1.8 0.9823184
#3     TV1 02:00:00      New      8      7      9      2      4  5.4 7.9619720
#   Uplift1  Uplift2  Uplift3  Uplift4
#1 4.994603 3.994603 4.994603 5.994603
#2 4.982318 2.982318 8.982318 6.982318
#3 6.961972 8.961972 1.961972 3.961972

NOTE: Created reproducible 'TV1Slope' and 'TV5Slope' examples

data

TOTAL <- structure(list(Channel = c("TV1", "TV5", "TV1"), Hour = c("01:00:00", 
"23:00:00", "02:00:00"), Category = c("New", "Old", "New"), cols11 = c(2L, 
1L, 8L), cols12 = c(5L, 5L, 7L), cols13 = c(4L, 3L, 9L), cols14 = c(5L, 
9L, 2L), cols15 = c(6L, 7L, 4L), base = c(2.4, 1.8, 5.4)), .Names = c("Channel", 
"Hour", "Category", "cols11", "cols12", "cols13", "cols14", "cols15", 
"base"), class = "data.frame", row.names = c(NA, -3L))

set.seed(24)
TV1Slope <- setNames(runif(24)/100, sprintf('TV1Slope%02d', 0:23))
set.seed(29)
TV5Slope <- setNames(runif(24)/100, sprintf('TV5Slope%02d', 0:23))