How can I extract the name of all persons in Wikidata with Python?

179 views Asked by At

I would like to extract (all distinct) names of all persons, i.e named entities that are human, in Wikidata with Python. I have tried different libraries (qwikidata, mwikidata), different get requests and Wikidata's SPARQL Service itself. After a while I understood that a general query like this:

SELECT ?person ?personLabel

WHERE {
    ?person wdt:P31 wd:Q5 .
    ?person rdfs:label ?personLabel. FILTER( LANG(?personLabel)="de, en" )
}

is too huge for the public API. Then I added a combination of limit and offset at the end of the query, e.g.:

ORDER BY ASC(?personLabel)

LIMIT 10000 OFFSET 10000

But no matter what I try I get either a TimeOutError (wikidata service) or json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) (python)

One idea is to generate multiple datasets with the biological sex property (P21), but for male and female the same problems persists.

Help is much appreciated!

0

There are 0 answers