Find last non-missing value per row

126 views Asked by At

I have a dataframe with a variable var that was measured at time points 0 to 2. Like this:

df <- data.frame(id= letters[1:5],
                 var0= c(1:3, NA, 5),
                 var1= c(11, NA, NA, 14:15),
                 var2= c(NA, NA, NA, NA, 25))
df
  id var0 var1 var2
1  a    1   11   NA
2  b    2   NA   NA
3  c    3   NA   NA
4  d   NA   14   NA
5  e    5   15   25

For each row, i.e. for each person, I want to keep the newest non-missing value. So the desired output is:

  id var0 var1 var2 last_val
1  a    1   11   NA       11
2  b    2   NA   NA        2
3  c    3   NA   NA        3
4  d   NA   14   NA       14
5  e    5   15   25       25

How to do this within the tidyverse packages?

5

There are 5 answers

0
ThomasIsCoding On

A trick with left_join + pivot_longer

df %>%
    left_join(
        (.) %>%
            pivot_longer(!id, values_to = "last_val") %>%
            na.omit() %>%
            slice_tail(n = 1, by = id) %>%
            select(!name)
    )

gives

  id var0 var1 var2 last_val
1  a    1   11   NA       11
2  b    2   NA   NA        2
3  c    3   NA   NA        3
4  d   NA   14   NA       14
5  e    5   15   25       25
0
Maël On

You can use coalesce. The trick is to reverse the order of the variables in coalesce so that it returns the last non-NA value in the real data:

library(dplyr)
df |> 
  mutate(last_val = coalesce(var2, var1, var0))

#   id var0 var1 var2 last_val
# 1  a    1   11   NA       11
# 2  b    2   NA   NA        2
# 3  c    3   NA   NA        3
# 4  d   NA   14   NA       14
# 5  e    5   15   25       25

If you want to make use of tidy selection, you can define a coacross function first, and then reverse the order. More about using across with coalesce here.

coacross <- function(...) {
  coalesce(!!!across(...))
}

df |> 
  mutate(last_val = coacross(rev(contains("var"))))
0
asd-tm On

Here is a single-line dplyr solution:

library(dplyr)
mutate(df, last_val = coalesce(var2, var1, var0))

Result:

  id var0 var1 var2 last_val
1  a    1   11   NA       11
2  b    2   NA   NA        2
3  c    3   NA   NA        3
4  d   NA   14   NA       14
5  e    5   15   25       25

As we were almost simultaneous with @Maël giving our first solution(s) here is another option:

  df %>% 
    mutate(last_val = apply(.[,-1], MARGIN = 1, \(x) do.call(coalesce, as.list(x[length(x):1]))))
0
Jilber Urbina On

An R base alternative:

> df$last_val <- apply(df, 1, \(x) tail(na.omit(x), 1))
> df
  id var0 var1 var2 last_val
1  a    1   11   NA       11
2  b    2   NA   NA        2
3  c    3   NA   NA        3
4  d   NA   14   NA       14
5  e    5   15   25       25
0
jblood94 On

Vectorized using max.col:

df$last_val <- df[,-1][cbind(1:nrow(df[,-1]), max.col(!is.na(df[,-1]), "l"))]
df
#>   id var0 var1 var2 last_val
#> 1  a    1   11   NA       11
#> 2  b    2   NA   NA        2
#> 3  c    3   NA   NA        3
#> 4  d   NA   14   NA       14
#> 5  e    5   15   25       25