Filling NA row values with nearest right side row value in R

1.9k views Asked by At

I want to convert the given dataframe from

             c1     c2   c3   c4    c5
    VEG PUFF <NA>    12  <NA>  <NA> 78.43
CHICKEN PUFF <NA>    16  <NA>  88.24 <NA>
BAKERY Total <NA>   <NA>  28   <NA> 84.04

to

             c1     c2  
    VEG PUFF 12     78.43   
CHICKEN PUFF 16     88.24    
BAKERY Total 28     84.04

I tried two methods but i didnt get accurate results it is sometimes taking left side row value

step1 <-  t(na.locf(t(df), fromLast=T))
step2 <-  t(na.locf(t(step1), fromLast=F))

library(dplyr)
MyReplace = function(data) {data %>% t %>% na.locf(.,,T) %>% na.locf %>% t
2

There are 2 answers

0
Ronak Shah On BEST ANSWER

Update

As there was lot of confusion on the expected output, updating the answer as suggested by @DavidArenburg using a tidyverse solution

library(dplyr)
library(tidyr)
df %>%
  add_rownames() %>%
  gather(variable, value, -rowname) %>%
  filter(!is.na(value)) %>%
  group_by(rowname) %>%
  mutate(indx = row_number()) %>%
  select(-variable) %>%
  spread(indx, value)

#        rowname   `1`   `2`
#*        <chr> <dbl> <dbl>
#1 BAKERY_Total    28 84.04
#2 CHICKEN_PUFF    16 88.24
#3     VEG_PUFF    12 78.43

Another solution could be

library(data.table)
temp <- apply(df, 1, function(x) data.frame(matrix(x[!is.na(x)], nrow = 1)))
rbindlist(temp, fill = T)

Previous Answer

If I have understand you correctly, you are trying to replace NA values in a row with the latest non-NA value in the same row

We can use na.locf with fromLast set as TRUE

t(apply(df, 1, function(x) na.locf(x, fromLast = T, na.rm = F)))


#             c1 c2    c3    c4    c5
#VEG_PUFF     12 12 78.43 78.43 78.43
#CHICKEN_PUFF 16 16 88.24 88.24    NA
#BAKERY_Total 28 28 28.00 84.04 84.04
7
akrun On

We can use na.omit

t(apply(df, 1, na.omit))
#             [,1]  [,2]
#VEG PUFF       12 78.43
#CHICKEN PUFF   16 88.24
#BAKERY Total   28 84.04

Update

Based on the excel data showed

lst <- apply(df, 1, na.omit)
df2 <- do.call(rbind, lapply(lst, `length<-`, max(lengths(lst))))
row.names(df2) <- row.names(df)

Or another option is melt/dcast from data.table

library(data.table)
dcast(melt(setDT(df1, keep.rownames=TRUE), id.var = 'rn', 
         na.rm = TRUE), rn~ paste0("c", rowid(rn)), value.var = "value")
#             rn c1    c2  c3
#1: BAKERY Total 28 84.04  NA
#2: CHICKEN PUFF 16 88.24 143
#3:     VEG PUFF 12 78.43  NA

To provide a reproducible example,

df1 <- structure(list(c1 = c(NA, NA, NA), c2 = c(12L, 16L, NA), c3 = c(NA, 
NA, 28L), c4 = c(NA, 88.24, NA), c5 = c(78.43, 143, 84.04)), .Names = c("c1", 
"c2", "c3", "c4", "c5"), class = "data.frame", row.names = c("VEG PUFF", 
"CHICKEN PUFF", "BAKERY Total"))

lst <- lapply(seq_len(nrow(df1)), function(i) {
               x1 <- unlist(df1[i,])
               x1[complete.cases(x1)]})
df2 <- do.call(rbind, lapply(lst, `length<-`, max(lengths(lst))))
row.names(df2) <- row.names(df1)

The above approach is similar to the apply method except that we can be always sure that this output a list (in the apply - it can vary. When the number of elements are the same after removing the NA, it will output a matrix, in other cases a list). So, we loop over the sequence of rows, remove the NA elements, pad NA at the end to make lengths of list elements same and then rbind


Or another option is which with arr.ind=TRUE

ind <- which(!is.na(df), arr.ind=TRUE)
matrix(df[ind[order(ind[,1]),]], ncol=2, byrow=TRUE, 
            dimnames = list(row.names(df), paste0("c", 1:2)))
#             c1    c2
#VEG PUFF     12 78.43
#CHICKEN PUFF 16 88.24
#BAKERY Total 28 84.04