SPARQL Federated Query takes too long to respond

141 views Asked by At

I have a SPARQL federated query where I join data from wikidata and dbpedia. When I run the first two queries it takes reasonable time. However, when I add the 3rd service it takes too much time. In the 3rd query I fetch the entities obtained from first two queries and filter by looking at if they are 'subclass of' 'percussion instrument'.

Here is my query (Query for returning percussion instruments in middle east):

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX p: <http://www.wikidata.org/prop/>
PREFIX ps: <http://www.wikidata.org/prop/statement/>
PREFIX pq: <http://www.wikidata.org/prop/qualifier/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX bd: <http://www.bigdata.com/rdf#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX dbp: <http://dbpedia.org/property/>



SELECT DISTINCT
        ?instrument
        (?countryDbpediaID)
        (?country as ?wikidataID)
        (?countryLabel as ?origin)
WHERE {
          SERVICE <https://query.wikidata.org/sparql>
          {
              SELECT DISTINCT ?country ?countryLabel
              WHERE {
                        ?country wdt:P361 wd:Q7204 .

                        SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
                    }
          }

          SERVICE <https://dbpedia.org/sparql>
          {
              SELECT DISTINCT ?intermediateEntityDbpediaID
                ?intermediateEntityWikidataUri
                ?intermediateEntityWikidataID
                ?countryDbpediaID ?description
              FROM <http://dbpedia.org>
                   WHERE { ?countryDbpediaID owl:sameAs ?country;
                                      rdfs:label ?label ;
                                      foaf:depiction ?image;
                                      rdfs:comment ?description .
                            ?intermediateEntityDbpediaID dbp:origin ?countryDbpediaID;
                                               rdfs:label ?intermediateEntityLabel ;
                                               owl:sameAs ?intermediateEntityWikidataUri .


                           FILTER (LANG(?label) = "en")
                           FILTER (LANG(?intermediateEntityLabel) = "en")
                           FILTER (STRSTARTS(STR(?intermediateEntityWikidataUri), STR('http://www.wikidata.org')))
                           FILTER (LANG(?description) = "en")
                           BIND(REPLACE(STR(?intermediateEntityWikidataUri),"http://www.wikidata.org/entity/","","i") AS ?intermediateEntityWikidataID)
                         }
          }

          SERVICE <https://query.wikidata.org/sparql>
            {
                SELECT DISTINCT ?instrument
                WHERE {
                          ?instrument wdt:P279 wd:Q133163 .

                          FILTER (?instrument in (URI(?intermediateEntityWikidataUri)))

                          SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
                      }
            }



}

I found this related question but it didn't help me : SPARQL Speed up federated query

Is there any way to optimize this query?

1

There are 1 answers

2
Gregory Williams On

The three federated queries here do not have any shared join variables. The only variables being returned from them (in the sub-queries' SELECT DISTINCT clauses) are all disjoint. That means that evaluation is performing two cartesian joins.

The three sub-queries return 21, 504, and 0 results, respectively. So I think the end result would be zero rows returned. But the query engine may be taking a very sub-optimal route towards that answer and timing out.

Update:

Given the repeated use of variables like ?intermediateEntityWikidataUri, I suspect these are intended to be used to join data across the federated sub-queries. But as written, the query can't do that. For example, given SPARQL's bottom-up semantics, you can't use ?intermediateEntityWikidataUri in the FILTER of the third query without that variable being bound in the same scope.

No matter what else is in the query, this:

  SERVICE <https://query.wikidata.org/sparql>
    {
        SELECT DISTINCT ?instrument
        WHERE {
                  ?instrument wdt:P279 wd:Q133163 .

                  FILTER (?instrument in (URI(?intermediateEntityWikidataUri)))
                  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
              }
    }

will result in zero results, because the filter expression is evaluated using an unbound variable (which will filter out all results).