In R: row wise return max value and corresponding column name

598 views Asked by At

I am trying to obtain the max value row wise across several columns, and create 2 new columns with the max value, and the corresponding column name. Then, using the column name, I need to select the value of another column that shares a substring of that column name.

This is an example of what I trying to solve:

measure_day1 <-  c(1,2,5)
measure_day2 <- c(5,7,1)
measure_day3 <- c(2,3,9)
temp_day1 <- c(25, 27, 29)
temp_day2 <- c(31, 33, 35)
temp_day3 <- c(14, 16, 19)

df <- data.frame(measure_day1, measure_day2, measure_day3, temp_day1, temp_day2, temp_day3)

  measure_day1 measure_day2 measure_day3 temp_day1 temp_day2 temp_day3
1            1            5            2        25        31        14
2            2            7            3        27        33        16
3            5            1            9        29        35        19

And this would be the result:

 measure_day1 measure_day2 measure_day3 temp_day1 temp_day2 temp_day3 measure_max day_measure_max temp_day_measure_max
1            1            5            2        25        31        14           5    measure_day2                   31
2            2            7            3        27        33        16           7    measure_day2                   33
3            5            1            9        29        35        19           9    measure_day3                   19

I found this similar question but could not complete my task.

For each row return the column name of the largest value

Any help is greatly appreciated.

2

There are 2 answers

0
Duck On BEST ANSWER

Try this tidyverse approach. It can be more practical reshaping data to long previously creating an id per row and then extract the desired values using filter. Using pivot_wider() you can have the desired values and then the filter for max values is applied. Finally you can merge to your original data using left_join() and the id you created based on rows. Here the code:

library(dplyr)
library(tidyr)
#Code
newdf <- df %>% mutate(id=1:n()) %>%
  left_join(df %>% mutate(id=1:n()) %>%
  pivot_longer(-id) %>%
  separate(name,c('Var','Day'),sep='_') %>%
  pivot_wider(names_from=Var,values_from=value) %>%
  group_by(id) %>%
  filter(measure==max(measure)) %>%
  mutate(Day=paste0('measure_',Day)) %>% select(-measure) %>%
  rename(measure_max=Day,temp_day_measure_max=temp)) %>% select(-id)

Output:

  measure_day1 measure_day2 measure_day3 temp_day1 temp_day2 temp_day3  measure_max temp_day_measure_max
1            1            5            2        25        31        14 measure_day2                   31
2            2            7            3        27        33        16 measure_day2                   33
3            5            1            9        29        35        19 measure_day3                   19
0
Ronak Shah On

A base R method :

#Select measure columns
measure_cols <- grep('measure', names(df), value = TRUE)
#Select temp_day column
temp_cols <- grep('temp_day', names(df))
#Get the index of max value in measure column
inds <- max.col(df[measure_cols])
#Get max value in measure columns
df$measure_max <- do.call(pmax, df[measure_cols])
#Get the name of max value in measure columns
df$day_measure_max <- measure_cols[inds]
#Get the corresponding temp column. 
df$temp_day_measure_max <- df[temp_cols][cbind(1:nrow(df), inds)]
df

#  measure_day1 measure_day2 measure_day3 temp_day1 temp_day2 temp_day3
#1            1            5            2        25        31        14
#2            2            7            3        27        33        16
#3            5            1            9        29        35        19

#  measure_max day_measure_max temp_day_measure_max
#1           5    measure_day2                   31
#2           7    measure_day2                   33
#3           9    measure_day3                   19