How to get intersection-like behavior with SPARQL 1.1's VALUES?

6.2k views Asked by At

Using SPARQL 1.1's values, the following query returns all predicates with Einstein or Knuth as the subject (along with their labels).

PREFIX dbp: <http://dbpedia.org/resource/>

SELECT DISTINCT ?sub ?outpred ?label
{
  VALUES ?sub { dbp:Albert_Einstein dbp:Donald_Knuth }
  ?sub ?outpred [] .
  ?outpred <http://www.w3.org/2000/01/rdf-schema#label> ?label .
}

SPARQL results

Is it possible to use this values feature to expose an intersection rather than a union of the predicates? Or am I misunderstanding what values is for?

EDIT: Clarification

For a simplified example, say there are these triples:

<Einstein>  <influenced>    <John>
<Einstein>  <influenced>    <Knuth>
<Einstein>  <born>          <Mars>
<Einstein>  <died>          <Los Angeles>
<Knuth>     <influenced>    <Kirby>
<Knuth>     <born>          <Mars>
<Knuth>     <wrote>         <TAOCP>
<Knuth>     <drove>         <Truck>

The "union" I'm getting is all unique predicates attached to either subject (line separated for clarity):

|  ?sub    |  ?pred     |
-------------------------
<Einstein>  <influenced>
<Knuth>     <influenced>

<Einstein>  <born>
<Knuth>     <born>

<Einstein>  <died>

<Knuth>     <wrote>

<Knuth>     <drove>

The "intersection" I'm after is all unique predicates common to both subjects:

|  ?sub    |  ?pred     |
-------------------------
<Einstein>  <influenced>
<Knuth>     <influenced>

<Einstein>  <born>
<Knuth>     <born>
1

There are 1 answers

1
Joshua Taylor On BEST ANSWER

The Solutions

You can use a query like this. The trick is to group by the predicate, and only take those predicates for which there are exactly two subjects (Einstein and Knuth).

select distinct ?outpred ?label
{
  values ?sub { dbp:Albert_Einstein dbp:Donald_Knuth }
  ?sub ?outpred [] .
  ?outpred <http://www.w3.org/2000/01/rdf-schema#label> ?label .
}
group by ?outpred ?label
having count(distinct ?sub) = 2

Of course, this does require retrieving all the data that you would need for a union, and then condensing it down. I don't expect that that will be much of a problem, but if it is (e.g., if you're trying to take the intersection for lots of subjects), then you can also just list the subjects separately:

select distinct ?outpred ?label
{
  dbp:Albert_Einstein ?outpred [].
  dbp:Donald_Knuth ?outpred [].
  ?outpred <http://www.w3.org/2000/01/rdf-schema#label> ?label .
}

Discussion

Is it possible to use this VALUES feature to expose an intersection rather than a union of the predicates? Or am I misunderstanding what VALUES are for?

Values essentially is another set of bindings that gets joined with the other bindings, so it can't do intersection for you the way that you'd like. However, to do an "intersection" of the sort you're looking for here isn't too hard:

select distinct ?outpred ?label
{
  dbp:Albert_Einstein ?outpred [] .
  dbp:Donald_Knuth ?outpred [] .
  ?outpred <http://www.w3.org/2000/01/rdf-schema#label> ?label .
}

Now, that said, that could be a lot of triple patterns to write, so you might want some query where the only thing you have to change is a list of values. You can specify the values, and then group by the property and label (i.e., the non-values variables), and just take those solution for which count(distinct ?sub) is the number of values that you specified. E.g.:

select distinct ?outpred ?label
{
  values ?sub { dbp:Albert_Einstein dbp:Donald_Knuth }
  ?sub ?outpred [] .
  ?outpred <http://www.w3.org/2000/01/rdf-schema#label> ?label .
}
group by ?outpre ?label
having count(distinct ?sub) = 2

This way, in order to get count(distinct ?sub) to be 2, you must have had ?sub ?outpred [] match for both ?sub = Einstein and ?sub = Knuth.

Checking the Approach

We can use the DBpedia endpoint to work through these. First, a simplified query:

select distinct ?s ?p where {
  values ?s { dbpedia:Albert_Einstein dbpedia:Donald_Knuth }
  ?s ?p []
}

SPARQL results

s                                             p
http://dbpedia.org/resource/Albert_Einstein   http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://dbpedia.org/resource/Donald_Knuth      http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://dbpedia.org/resource/Albert_Einstein   http://www.w3.org/2002/07/owl#sameAs
http://dbpedia.org/resource/Donald_Knuth      http://www.w3.org/2002/07/owl#sameAs
⋮                                            ⋮

Now, it doesn't make sense to ask for an intersection while we're still selecting ?s, because Einstein ≠ Knuth, so there's never any intersection. But we can take an intersection on ?p. Here's a query that gets all the properties for which both have values:

select distinct ?p where {
  dbpedia:Albert_Einstein ?p [] .
  dbpedia:Donald_Knuth ?p []
}

SPARQL results

A similar query counts the results for us:

select (count(distinct ?p) as ?np) where {
  dbpedia:Albert_Einstein ?p [] .
  dbpedia:Donald_Knuth ?p [] .
}

There are 45 properties that they both have.

The group by query is

select distinct ?p where {
  values ?s { dbpedia:Albert_Einstein dbpedia:Donald_Knuth }
  ?s ?p []
}
group by ?p
having count(?s) = 2

Now lets make sure that the other approach gets the same results:

select (count(*) as ?np) where {
  select distinct ?p where {
    values ?s { dbpedia:Albert_Einstein dbpedia:Donald_Knuth }
    ?s ?p []
  }
  group by ?p
  having count(distinct ?s) >= 2
}

This also returns 45, so we see that we get the same results.