Fastest way to check if dataframe is empty

16.5k views Asked by At

What is the fastest (every microsecond counts) way to check if a data.frame is empty? I need it in the following context:

if (<df is not empty>) { do something here }

Possible solutions:

1) if(is.empty(df$V1) == FALSE from `spatstat' package

2) if(nrow(df) != 0)

3) Your solution

I could do:

library(microbenchmark)
microbenchmark(is.empty(df),times=100)
Unit: microseconds
         expr min  lq mean median  uq max neval
 is.empty(df) 5.8 5.8  6.9      6 6.2  66   100 

but not sure how to time 2). And what is your solution to empty df?

Thanks!

1

There are 1 answers

4
Frank On BEST ANSWER

Suppose we have two types of data.frames:

emptyDF = data.frame(a=1,b="bah")[0,]
fullDF  = data.frame(a=1,b="bah")

DFs = list(emptyDF,fullDF)[sample(1:2,1e4,replace=TRUE)]

and your if condition shows up in a loop like

boundDF = data.frame()
for (i in seq_along(DFs)){ if (nrow(DFs[[i]]))
  boundDF <- rbind(boundDF,DFs[[i]])
}

In this case, you're approaching the problem in the wrong way. The if statement is not necessary: do.call(rbind,DFs) or library(data.table); rbindlist(DFs) is faster and clearer.

Generally, you are looking for improvement to the performance of your code in the wrong place. No matter what operation you're doing inside your loop, the step of checking for non-emptiness of the data.frame is not going to be the part that is taking the most time. While there may be room for optimization in this step, "Premature optimization is the root of all evil" as Donald Knuth said.