Warning message on running mongo.cursor.to.data.frame function in rmongodb

114 views Asked by At

When I run the the query

mongo.cursor.to.data.frame(cursor)

to fetch the documents in a collection to a data frame in R using rmongodb, I am getting thewarning message:

In mongo.cursor.to.data.frame(cursor) : This fails for most NoSQL data structures. I am working on a new solution

I checked some articles about rmongodb and I could find this message mentioned there too. Does this warning mean that there might be some issues in the resulting data frame?

1

There are 1 answers

0
SymbolixAU On

The source code shows where the issues could arise

mongo.cursor.to.data.frame <- function(cursor, nullToNA=TRUE, ...){

  warning("This fails for most NoSQL data structures. I am working on a new solution")

  res <- data.frame()
  while ( mongo.cursor.next(cursor) ){
    val <- mongo.bson.to.list(mongo.cursor.value(cursor))

    if( nullToNA == TRUE )
      val[sapply(val, is.null)] <- NA

    # remove mongo.oid -> data.frame can not deal with that!
    val <- val[sapply(val, class) != 'mongo.oid']

    res <- rbind.fill(res, as.data.frame(val, ... ))

  }
  return( as.data.frame(res) )
}

We can see it's using plyr::rbind.fill to rbind data.frames. So this all comes down to what is passed into rbind.fill, namely val.

And val is the result of val <- mongo.bson.to.list(mongo.cursor.value(cursor)).

So as long as as.data.frame(val, ...) can handle the list structure you pass into it you're ok.

However, it's quite easy to conceive a NoSQL data structure that will fail this:

## consider the JSON structure
## [{"a":[1,2,3],"b":["a","b","c"]},{"d":[1.1,2.2,3.3],"e":[["nested","list"]]}] 

##Which in R is the same as
lst = list(list(a = c(1L,2L,3L),
                b = c("a","b","c")),
           list(d = c(1.1, 2.2, 3.3),
                e = list(c("nested", "list"))))

## this errors when coerced to a data.frame
as.data.frame(lst)
Error in data.frame(d = c(1.1, 2.2, 3.3), e = list(c("nested", "list")),  : 
  arguments imply differing number of rows: 3, 2

It's at this point I should mention the mongolite package, which is generally faster, but again returns a data.frame.

And there's also my extension to mongolite, mongolitedt (not yet on CRAN) that is quicker still and retrieving data, but again is limited by the result has to be coerced into a data.table