I would like to extract (all distinct) names of all persons, i.e named entities that are human, in Wikidata with Python. I have tried different libraries (qwikidata, mwikidata), different get requests and Wikidata's SPARQL Service itself. After a while I understood that a general query like this:
SELECT ?person ?personLabel
WHERE {
?person wdt:P31 wd:Q5 .
?person rdfs:label ?personLabel. FILTER( LANG(?personLabel)="de, en" )
}
is too huge for the public API. Then I added a combination of limit and offset at the end of the query, e.g.:
ORDER BY ASC(?personLabel)
LIMIT 10000 OFFSET 10000
But no matter what I try I get either a TimeOutError
(wikidata service) or json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
(python)
One idea is to generate multiple datasets with the biological sex property (P21), but for male and female the same problems persists.
Help is much appreciated!