How do I properly refer to data.frame cells in functions

83 views Asked by At

My data is instrument reads and instrument baselines. The baseline data is punctual and typically does not extend to the "ends" of the dataset (i.e. first and last rows). Therefore I want to make a function that looks at the baseline column, and copies the values of the earliest and latest baselinepoints to the very first/last rows in the dataset, so that I can interpolate between them with approx().

I have so far done this manually, as exemplified below, but I need to do this task over and over again, so I'd like to make it a function. I checked for other threads around here, and from what I read, I think must have to do with the different ways to address columns and cells esp. When using self-made functions in data.frames.

Here is an example

#Make Two data frames: one holds instrument data, and one holds some 
#baseline calibration we need to extend to the ends of the dataset

time<-seq(1,100,1)
data1<-rnorm(n = 100,mean = 7.5, sd = 1.1)
table1<-data.frame(cbind(time, data1))

time<-data.frame("time"=seq(2,96,4))
data2<-(0.32*rnorm(n = 24, mean = 1, sd = 1))
table2<-cbind(time,data2)

rm(time)

#now merge the two tables
newtable<-merge(table1, table2, by="time", all=T)

#remove junk
rm(data1, data2,table1,table2)

#copy 3rd column for later testing
newtable$data3<-newtable$data2

#the old manual way to fill the first row
newtable$data2[1]<-newtable$data2[min(which(!is.na(newtable$data2)))]

#the old manual way to fill the last row
newtable$data2[nrow(newtable)]<-newtable$data2[max(which(!is.na(newtable$data2)))]

#Now I try with a function

endfill<-function(df, col){
 
   #fill the first row
  df[1,col] <- df[min(which(!is.na(df[[col]]))), col]    # using = instead of <- has no effect
  df[nrow(df),col]<-df[max(which(!is.na(df[[col]]))),col]
  # 

}  

#I want to try my funtion in column 4:

endfill(df=  newtable,col = 4)

#Does not work...

Another try:

endfill<-function(df, col){
 
   #fill the first row
  df$col[1] <-  df[[col]]  [min(which(!is.na(df[[col]])))] # using $names
  #df[nrow(df),col]<-df[max(which(!is.na(df[[col]]))),col]
  # 

}  

endfill(df=  newtable,col = 4)
# :-(

In the function I have tried different approaches to address cells, first with using df$col[1], then also with df[[col]][1], and mixed versions, but I seem to miss a point here. When I execute my above function in pieces, e.g. only the single parts before and after the "<-", they all make sense, i.e. deliver NA values for empty cells or the target value. But it seems impossible to do real assignments?!

3

There are 3 answers

1
Rui Barradas On

Here is a solution with function na.locf from package zoo.

endfill <- function(DF, col) {
  if(nrow(DF) > 0L) {
    DF[[col]] <- zoo::na.locf(DF[[col]], na.rm = FALSE)
    DF[[col]] <- zoo::na.locf(DF[[col]], na.rm = FALSE, fromLast = TRUE)
  }
  DF
}

df1 <- data.frame(
  x1 = c(NA, 1:3, NA),
  x2 = c(NA, NA, 1:2, NA)
)

endfill(df1, "x1")
#>   x1 x2
#> 1  1 NA
#> 2  1 NA
#> 3  2  1
#> 4  3  2
#> 5  3 NA
endfill(df1, "x2")
#>   x1 x2
#> 1 NA  1
#> 2  1  1
#> 3  2  1
#> 4  3  2
#> 5 NA  2

Created on 2024-02-26 with reprex v2.0.2

0
Onyambu On

if yu are filling the NAs, use tidyr::fill:

tidyr::fill(df1, everything(), .direction = 'downup')

  x1 x2
1  1  1
2  1  1
3  2  1
4  3  2
5  3  2

With piping, this can be written as:

library(tidyverse)
df1 %>%
  fill(x1:x2, .direction = 'downup')
0
SeRo1210 On

The option offered above by Rui works, and fills the gaps directly, Thanks all for taking the time.

In the meantime i also found that it was not the syntax of adressing the single cells in the dataframe, that was causing the error. Indeed, i missed to RETURN the dataframe under operation in the function. Otherwise it seems the result of the function is not finalized in the dataframe.

This works for me:

    endfill<-function(df, col){
  
  #fill the first row
  df[1,col] <- df[min(which(!is.na(df[[col]]))), col]    
  df[nrow(df),col]<-df[max(which(!is.na(df[[col]]))),col]
  df
  
}