How can I extract the name of all persons in Wikidata with Python?

182 views Asked by At

I would like to extract (all distinct) names of all persons, i.e named entities that are human, in Wikidata with Python. I have tried different libraries (qwikidata, mwikidata), different get requests and Wikidata's SPARQL Service itself. After a while I understood that a general query like this:

SELECT ?person ?personLabel

    ?person wdt:P31 wd:Q5 .
    ?person rdfs:label ?personLabel. FILTER( LANG(?personLabel)="de, en" )

is too huge for the public API. Then I added a combination of limit and offset at the end of the query, e.g.:

ORDER BY ASC(?personLabel)

LIMIT 10000 OFFSET 10000

But no matter what I try I get either a TimeOutError (wikidata service) or json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) (python)

One idea is to generate multiple datasets with the biological sex property (P21), but for male and female the same problems persists.

Help is much appreciated!


There are 0 answers