Ways to reduce query time of SPARQL query on Wikidata?

1.1k views Asked by At

I want to create a histogram of births and deaths for people on English Wikipedia, but I am running into query time limits on Wikidata.

I formed the following query:

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX schema: <http://schema.org/>

SELECT ?item ?article ?_date_of_birth ?_date_of_death WHERE {
  ?item wdt:P31 wd:Q5.
  ?article schema:about ?item.
  ?article schema:isPartOf <https://en.wikipedia.org/>.
  OPTIONAL { ?item wdt:P569 ?_date_of_birth. }
  OPTIONAL { ?item wdt:P570 ?_date_of_death. }
}

LIMIT 10000

Try it here

This works fine in and of itself, but as I'm trying to get the whole list, when I start adding offsets, I run into query time limits around OFFSET 500000. According to the Wikidata manual, I should try to optimize my query, but is there a way to optimize this? There are definitely more than 500000 people on wikipedia, as just finding transclusions of the 'birth date' template yields over 600000.

I have also tried dbpedia, but some of it is out of date, for example Muhammad Ali has no death date on dbpedia.

I've also tried not filtering out the english articles, i.e. asking for all of them and filtering on my end, but similar scaling issues still exist, albeit at a much higher offset.

0

There are 0 answers