How to write binary data into SQLite with R DBI's dbWriteTable()?

3k views Asked by At

For instance, how to execute the equivalent following SQL (which inserts into a BINARY(16) field)

INSERT INTO Table1 (MD5) VALUES (X'6717f2823d3202449201145073ab871A'),(X'6717f2823d3202449301145073ab371A')

using dbWriteTable()? Doing

dbWriteTable(db, "Table1", data.frame(MD5 = "X'6717f2823d3202449201145073ab871A'", ...), append = T, row.names = F)

doesn't seem to work - it writes the values as text.

In the end, I'm going to have a big data.frame of hashes that I want to write, and so perfect for using dbWriteTable. But I just can't figure out how to INSERT the data.frame into binary database fields.

1

There are 1 answers

3
jlhoward On

So here are two possibilities that seem to work. The first uses dbSendQuery(...) in a loop (you've probably thought of this already...).

db.WriteTable = function(con,table,df) {  # no error checking whatsoever...
  require(DBI)
  field <- colnames(df)[1] 
  for (i in 1:nrow(df)) {
    query <- sprintf("INSERT INTO %s (%s) VALUES (X'%s')",table,field,df[i,1])
    rs    <- dbSendQuery(con,statement=query)
  }
  return(nrow(df))
}

library(DBI)
drv <- dbDriver("SQLite")
con <- dbConnect(drv)
rs  <- dbSendQuery(con, statement="CREATE TABLE hash (MD5 BLOB)")

df  <- data.frame(MD5=c("6717f2823d3202449201145073ab871A",
                        "6717f2823d3202449301145073ab371A"))

rs       <- db.WriteTable(con,"hash",df)
result.1 <- dbReadTable(con,"hash")
result.1
#                                                              MD5
# 1 67, 17, f2, 82, 3d, 32, 02, 44, 92, 01, 14, 50, 73, ab, 87, 1a
# 2 67, 17, f2, 82, 3d, 32, 02, 44, 93, 01, 14, 50, 73, ab, 37, 1a

If your data frame of hashes is very large, then df.WriteFast(...) does the same thing as db.WriteTable(...) only it should be faster.

db.WriteFast = function(con.table,df) {
  require(DBI)
  field <- colnames(df)[1]
  lapply(unlist(df[,1]),function(x){
         dbSendQuery(con,
                     statement=sprintf("INSERT INTO %s (%s) VALUES (X'%s')",
                                        table,field,x))})
}

Note that result.1 is a data frame, and if we use it in a call to dbWriteTable(...) we can successfully write the hashes to a BLOB. So it is possible.

str(result.1)
# 'data.frame': 2 obs. of  1 variable:
#  $ MD5:List of 2
#   ..$ : raw  67 17 f2 82 ...
#   ..$ : raw  67 17 f2 82 ...

The second approach takes advantage of R's raw data type to create a data frame structured like result.1, and passes that to dbWriteTable(...). You'd think this would be easy, but no.

h2r = function(x) {
  bytes <- substring(x, seq(1, nchar(x)-1, 2), seq(2, nchar(x), 2))
  return(list(as.raw(as.hexmode(bytes))))
}
hash2raw = Vectorize(h2r)

df.raw=data.frame(MD5=list(1:nrow(df)))
colnames(df.raw)="MD5"
df.raw$MD5 = unname(hash2raw(as.character(df$MD5)))
dbWriteTable(con, "newHash",df.raw)
result.2 <- dbReadTable(con,"newHash")
result.2

all.equal(result.1$MD5,result.2$MD5)
# [1] TRUE

In this approach, we create a data frame df.raw which has one column, MD5, wherein each element is a list of raw bytes. The utility function h2r(...) takes a character representation of the hash, breaks it into a vector of char(2) (the bytes), then interprets each of those as hex (as.hexmode(...)), converts the result to raw (as.raw(...)), and finally returns the result as a list. Vectorize(...) is a wrapper that allows hash2raw(...) to take a vector as its argument.

Personally, I think you're better off using the first approach: it takes advantage of SQLite's internal mechanism for writing hex to BLOBs, and it's much easier to understand.