How to sum ifelse statements on the fly with [R]

3.4k views Asked by At

I have a r conundrum and would be very grateful of any assistance please. I need to write a piece of code that requires to be written one line to fit with a larger automated process. I have supplied some dummy data to help illustrate.

I have three ifelse statements that return 1’s or 0’s. I need to sum these 1’s and 0’s yet because of other inherited constraints in my real data I can’t refer to their output ‘and then’ sum them. I ‘need’ to sum them on the fly.

To be explicit… I cannot explicitly refer to the output 1’s and 0’s of either ‘use_sms’, ‘use_data’ or ‘use_voice’ and I cannot just pass an apply/1/sum to the dataframe.

Somehow, what I need is a fully contained sum of the three ifelse’s, something along the lines of… in crude non r language…

sum(
ifelse(sms_rev0 & sms_cnt0 > 0 | sms_rev1 & sms_cnt1 > 0 | sms_rev2 & sms_cnt2 > 0, 1, 0),
ifelse(data_rev0 & data_cnt0 > 0 | data_rev1 & data_cnt1 > 0 | data_rev2 & data_cnt2 > 0, 1, 0),
ifelse(voice_rev0 & voice_cnt0 > 0 | voice_rev1 & voice_cnt1 > 0 | voice_rev2 & voice_cnt2 > 0, 1, 0)
) 

My real data is presented to me similar to this headache_df

headache_df = data.frame(sms_rev0 = sample(1:0, 10, replace = T),
                        sms_cnt0 = sample(1:0, 10, replace = T),
                        sms_rev1 = sample(1:0, 10, replace = T),
                        sms_cnt1 = sample(1:0, 10, replace = T),
                        sms_rev2 = sample(1:0, 10, replace = T),
                        sms_cnt2 = sample(1:0, 10, replace = T),
                        data_rev0 = sample(1:0, 10, replace = T),
                        data_cnt0 = sample(1:0, 10, replace = T),
                        data_rev1 = sample(1:0, 10, replace = T),
                        data_cnt1 = sample(1:0, 10, replace = T),
                        data_rev2 = sample(1:0, 10, replace = T),
                        data_cnt2 = sample(1:0, 10, replace = T),
                        voice_rev0 = sample(1:0, 10, replace = T),
                        voice_cnt0 = sample(1:0, 10, replace = T),
                        voice_rev1 = sample(1:0, 10, replace = T),
                        voice_cnt1 = sample(1:0, 10, replace = T),
                        voice_rev2 = sample(1:0, 10, replace = T),
                        voice_cnt2 = sample(1:0, 10, replace = T))

row.names(headache_df) = paste0("row", 1:10)

And i am looking to capture my results in this headache combating panado_df

panado_df = data.frame(user = row.names(headache_df))
attach(headache_df)
set.seed(1234)

I generate three ifelse statements to illustrate but in my real data its really the sum of these I need to capture.

panado_df$use_sms = ifelse(sms_rev0 & sms_cnt0 > 0 | sms_rev1 & sms_cnt1 > 0 | sms_rev2 & sms_cnt2 > 0, 1, 0)
panado_df$use_data = ifelse(data_rev0 & data_cnt0 > 0 | data_rev1 & data_cnt1 > 0 | data_rev2 & data_cnt2 > 0, 1, 0)
panado_df$use_voice = ifelse(voice_rev0 & voice_cnt0 > 0 | voice_rev1 & voice_cnt1 > 0 | voice_rev2 & voice_cnt2 > 0, 1, 0)
rownames(panado_df) = panado_df$user
panado_df$user = NULL

I present a target column to illustrate what my calculated data should look like. Any cool solutions to achieve my aim please?

panado_df$target_column = apply(panado_df, 1, sum)
2

There are 2 answers

5
mabdrabo On BEST ANSWER

If I understand you correctly, you might be looking for something like this

panado_df$sums_3 <- sum(ifelse(sms_rev0 & sms_cnt0 > 0 | sms_rev1 & sms_cnt1 > 0 | sms_rev2 & sms_cnt2 > 0, 1, 0),
    ifelse(data_rev0 & data_cnt0 > 0 | data_rev1 & data_cnt1 > 0 | data_rev2 & data_cnt2 > 0, 1, 0),
    ifelse(voice_rev0 & voice_cnt0 > 0 | voice_rev1 & voice_cnt1 > 0 | voice_rev2 & voice_cnt2 > 0, 1, 0))

And your code could be more descriptive (just like you did it) using dplyr like follows

pando_df <- headach_df %>%
    mutate(use_sms=ifelse(sms_rev0 & sms_cnt0 > 0 | sms_rev1 & sms_cnt1 > 0 | sms_rev2 & sms_cnt2 > 0, 1, 0),
        use_data = ifelse(data_rev0 & data_cnt0 > 0 | data_rev1 & data_cnt1 > 0 | data_rev2 & data_cnt2 > 0, 1, 0),
        use_voice = ifelse(voice_rev0 & voice_cnt0 > 0 | voice_rev1 & voice_cnt1 > 0 | voice_rev2 & voice_cnt2 > 0, 1, 0)) %>%
    rowwise() %>%
    mutate(target_column=sum(use_sms, use_data, use_voice))

and if you'd like to return the vector target_column directly, adding magrittr library, check the following

pando_df <- headach_df %>%
    mutate(use_sms=ifelse(sms_rev0 & sms_cnt0 > 0 | sms_rev1 & sms_cnt1 > 0 | sms_rev2 & sms_cnt2 > 0, 1, 0),
        use_data = ifelse(data_rev0 & data_cnt0 > 0 | data_rev1 & data_cnt1 > 0 | data_rev2 & data_cnt2 > 0, 1, 0),
        use_voice = ifelse(voice_rev0 & voice_cnt0 > 0 | voice_rev1 & voice_cnt1 > 0 | voice_rev2 & voice_cnt2 > 0, 1, 0)) %>%
    rowwise() %>%
    mutate(target_column=sum(use_sms, use_data, use_voice)) %$%
    target_column
0
Jean On
headache_df <-within(headache_df, {
       use_sms <- as.integer(sms_rev0 & sms_cnt0  | sms_rev1 & sms_cnt1 | sms_rev2 & sms_cnt2)
       use_data<- as.integer(data_rev0 & data_cnt0  | data_rev1 & data_cnt1  | data_rev2 & data_cnt2)
       use_voice<- as.integer(voice_rev0 & voice_cnt0  | voice_rev1 & voice_cnt1  | voice_rev2 & voice_cnt2)
       target <- use_sms + use_data + use_voice 
})