Rowwise Column Count in Dataframe

232 views Asked by At

Let's say I have the following dataframe

country_df <- tibble(
  population = c(328, 38, 30, 56, 1393, 126, 57),
  population2 = c(133, 12, 99, 83, 1033, 101, 33),
  population3 = c(89, 39, 33, 56, 193, 126, 58),
  pop = 45
)

All I need is a concise way inside the mutate function to get the number of columns (population to population3) that are greater than the value of the pop column within each row.

So what I need is the following results (more specifically the GreaterTotal column) Note: I can get the answer by working through each column but it would take a while with more columns)

  population population2 population3   pop GreaterThan0 GreaterThan1 GreaterThan2 GreaterTotal
       <dbl>       <dbl>       <dbl> <dbl> <lgl>        <lgl>        <lgl>               <int>
1        328         133          89    45 TRUE         TRUE         TRUE                    3
2         38          12          39    45 FALSE        FALSE        FALSE                   0
3         30          99          33    45 FALSE        TRUE         FALSE                   1
4         56          83          56    45 TRUE         TRUE         TRUE                    3
5       1393        1033         193    45 TRUE         TRUE         TRUE                    3
6        126         101         126    45 TRUE         TRUE         TRUE                    3
7         57          33          58    45 TRUE         FALSE        TRUE                    2

I've tried using apply with the row index, but I can't get at it. Can somebody please point me in the right direction?

2

There are 2 answers

0
Ronak Shah On BEST ANSWER

You can select the 'Population' columns and compare those column with pop and use rowSums to count how many of them are greater in each row.

cols <- grep('population', names(country_df))
country_df$GreaterTotal <- rowSums(country_df[cols] > country_df$pop)

#  population population2 population3   pop GreaterTotal
#       <dbl>       <dbl>       <dbl> <dbl>        <dbl>
#1        328         133          89    45            3
#2         38          12          39    45            0
#3         30          99          33    45            1
#4         56          83          56    45            3
#5       1393        1033         193    45            3
#6        126         101         126    45            3
#7         57          33          58    45            2

In dplyr 1.0.0, you can do this with rowwise and c_across :

country_df %>%
  rowwise() %>%
  mutate(GreaterTotal = sum(c_across(population:population3) > pop))
0
akrun On

Using tidyverse, we can do

library(dplyr)
country_df %>%
      mutate(GreaterTotal = rowSums(select(., 
              starts_with('population')) > .$pop) )

-output

# A tibble: 7 x 5
#  population population2 population3   pop GreaterTotal
#       <dbl>       <dbl>       <dbl> <dbl>        <dbl>
#1        328         133          89    45            3
#2         38          12          39    45            0
#3         30          99          33    45            1
#4         56          83          56    45            3
#5       1393        1033         193    45            3
#6        126         101         126    45            3
#7         57          33          58    45            2