Someone has used serialize() to stick an R object into SQL Server, in a VARBINARY column.
How can I unserialize() it out?
Here's the code that gets the data:
library(RODBC)
dbhandle <- odbcDriverConnect("MyConnection String")
x = sqlQuery(dbhandle, sprintf('select SomevarbinaryColumn from MyTAble WHERE the_key = 8675309'))
If I try this:
unserialize(x$SomevarbinaryColumn[1])
I get
Error in unserialize(x$SomevarbinaryColumn[1]) :
'connection' must be a connection
Confusingly, the first parameter to unserialize() is named connection. The documentation says "unserialize reads an object (as written by serialize) from connection or a raw vector.". What kind of connection might be provided here? Instead, though, I think I'm trying to pass a raw vector, but I can't figure out how to convert the RODBC result to a raw vector.
I've tried to figure out what type x$SomevarbinaryColumn[1] is so I can convert it to something that unserialize() likes:
print(typeof(x$SomevarbinaryColumn[1]))
gives
[1] "list"
the RODBC docs say that varbinary will come back as "a list of raw vectors."
How do I get the raw vector that unserialize() wants from this returned list?
There are two problems here. One is that RODBC isn't capable of reliably getting binary data back from SQL Server; the other is that I have to figure out how to unpack the binary data returned (if it gets returned) so that I can call
unserialize()on it and rehydrate my object.In the related issue, I've explained what I found when investigating the problems with RODBC's handling of
varbinary(max).For now, let's suppose that RODBC works and gets the correct data back. I can sometimes coax it to do so, but for the long term a better fix is necessary. Maybe the package maintainer will respond, or maybe I'll for the package and make some fixes for myself, or maybe I'll switch to a different package.
Thing is, even if RODBC looks like it is working, it might be returning bad data. If I code this:
then either R encounters a GPE or I get data back. The data that comes back is the correct length, but the data is incorrect -- that is, it is uninitialized garbage. It can never be un-serialized into my object.
However, I can code this to work around the RODBC bug, sometimes:
and then I find that I do get correct data back. It's the right length, and contains the expected bytes!
All that's left is to find the right way to get my object out. And that's here:
Hopefully, that answer helps people who are using more reliable storages and face the question of how they might unpack their serialized data.