Difficulty reading a matrix as a node in rdflib

81 views Asked by At

I created an RDF file where images are stored in nodes as matrices. However when I try to read them I cannot get the matrix form: For example

from rflib import Literal 
mm = np.random.normal(0,1,(3,3))
L = Literal(mm)

it is very easy to get the matrix back with L.value

In [494]: L
Out[494]: rdflib.term.Literal(u'[[-1.39304728  0.39093531 0.88042378]\n   [ 0.22605682  0.56064787 -0.75176713]\n [ 0.57021203  0.31796492 -0.53303191]]')

 In [495]: L.value
 Out[495]: 
 array([[-1.39304728,  0.39093531,  0.88042378],
   [ 0.22605682,  0.56064787, -0.75176713],
   [ 0.57021203,  0.31796492, -0.53303191]])

However when I execute a SPARQL stored in image_nodes I get:

In [501]: res = [q for q in image_nodes]

In [502]: res[0][0]
Out[502]: rdflib.term.Literal(u'[[ 0.  0.  0. ...,  0.  0.  0.]\n [ 0.  0.  0. ...,  0.  0.  0.]\n [ 0.  0.  0. ...,  0.  0.  0.]\n ..., \n [ 0.  0.  0. ...,  0.  0.  0.]\n [ 0.  0.  0. ...,  0.  0.  0.]\n [ 0.  0.  0. ...,  0.  0.  0.]]')

In [503]: (res[0][0]).value
Out[503]: u'[[ 0.  0.  0. ...,  0.  0.  0.]\n [ 0.  0.  0. ...,  0.  0.  0.]\n [ 0.  0.  0. ...,  0.  0.  0.]\n ..., \n [ 0.  0.  0. ...,  0.  0.  0.]\n [ 0.  0.  0. ...,  0.  0.  0.]\n [ 0.  0.  0. ...,  0.  0.  0.]]'

Why cant I get the matrix format this time? This is in unicode and very resistant to any transformation. Thanks

1

There are 1 answers

0
Jörn Hees On

As you didn't provide the SPARQL query parts of this answer are guess-work...

If you create an rdflib.Literal(obj) RDFLib will try to convert the given object into a suitable RDF (XSD) representation. Certain python standard types are mapped here. If the given object doesn't have any of those types (like a np.array) the _castPythonToLiteral(obj) method will fall back to just return obj. So the L.value that you observe here is the untouched obj that you passed into Literal(obj), as can be seen in the constructor code, which arguably is confusing. As the actual returned inst of the Literal also is a unicode object, you can get the "content" of the Literal with unicode(L). This is important as it's also what will be stored in your stores / SPARQL endpoint.

Now if you retrieve your literal from a SPARQL endpoint and if obj was none of the standard types, then that endpoint only ever knew the string representation of obj, so it returns a l = Literal(str_rep_of_obj). So from the SPARQL endpoint's standpoint that is just a string. RDFLib also only will see a string and that is why you now suddenly get a unicode string in res[0][0].value.

In order to fix this what you need to do is de-serialize the string content of res[0][0] into your np.array, e.g. with json like this:

l = Literal(json.dumps(mm.tolist()))

and then later

imagenode = np.array(json.loads(unicode(res_literal)))

One additional remark though... similarly to databases saving binary blobs such as images in the database itself is usually not a good idea. It's usually better to put that binary data somewhere and then save a link in the database. Isn't it cool that RDF uses URIs ;)