Single quad + most basic SPARQL query = 1 result in Jena, 2 results in Sesame - who is right?

1.4k views Asked by At

Add just this one quad to an empty store:

<http://x.com/s> <http://x.com/p> 2 <http://x.com/g> .

Then execute this SPARQL query (taken from per Bob DuCharme's book 'Learning SPARQL', so this must be standard SPARQL for retrieving all quads across the dataset, regardless of implementation, right!?):

SELECT ?g ?s ?p ?o
WHERE {
{ ?s ?p ?o }
UNION
{ GRAPH ?g { ?s ?p ?o } } }

But Jena and Sesame reply with different answers!!? Here's what I see:

Jena Fuseki console on Tomcat 6.0.37 (version 2.10.0 - out-of-the-box, no configuration changes!) - (the correct answer as I understand things):

--------------------------------------------------------------
| g                | s                | p                | o |
==============================================================
| <http://x.com/g> | <http://x.com/s> | <http://x.com/p> | 2 |
--------------------------------------------------------------

Sesame Workbench on Tomcat 6.0.37 (version 2.7.3 - out-of-the-box, no configuration changes!): Just used the 'Add' feature in workbench to manually add the above quad (with 'N-Quad' selected in the 'Data format' dropdown box), in the 'Enter the RDF data you wish to upload' edit box, then running the above query:

--------------------------------------------------------------
| g                | s                | p                | o |
==============================================================
|                  | <http://x.com/s> | <http://x.com/p> | 2 |
| <http://x.com/g> | <http://x.com/s> | <http://x.com/p> | 2 |
--------------------------------------------------------------

So this is kinda scary for someone starting to look at RDF - what am I missing here? I assume Sesame can't be 'wrong' - so it must be my 'interpretation' I suppose (or Bob's query isn't 'standard SPARQL', and so different implementations are free to return different results) - any enlightenment would be very welcome :) !

1

There are 1 answers

2
Jeen Broekstra On BEST ANSWER

As @Joshua Taylor points out in his comment, the cause is that Sesame and Jena use a different interpretation of default graph.

In Sesame, the entire repository is considered the default graph: all statements in all named graphs as well as all statements without a named graph. Therefore, the first argument of your union, which queries the default graph, succeeds and binds ?s, ?p and ?o (but not ?g). The second argument of your union obviously succeeds as well because the original quad is of course in a named graph, and therefore you get two answers.

Jena uses an "exclusive" default graph by default: only statements that are not explicitly added to any particular named graph are in the default graph. Therefore, in Jena, the first part of your union fails (there are no matching statements in Jena's default graph), the second part succeeds, and you therefore only get 1 result.

Strange as it may sound, both are correct. The difference is simply in how the dataset on which the query is executed is set up.

Of course, there are ways to deal with this. In both Jena and Sesame, you can add FROM (NAMED) clauses to make it explicit what the queried dataset is (Sesame offers the sesame:nil graph name to explicitly query those statements that have no named graph associated). Alternatively, you can programmatically modify the dataset definition on which a query is executed. The precise mechanisms in Jena and Sesame are a bit different, but they both have the option (in Sesame, you can create and supply a Dataset object with your query before executing, in Jena I believe you can reconfigure the actual store or model on which you execute the query to behave differently).