I'm trying to query Wiktionary with SPARQL to get all the terms that are nouns of a certain language (for example German) and as output:
- the string of the noun
- the grammatical gender (genus): male, female, neutral
I am using the SPARQL-Endpoint: http://wiktionary.dbpedia.org/sparql and I found an example but I didn't figure out how to adapt it to get the information I want.
PREFIX terms:<http://wiktionary.dbpedia.org/terms/>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX dc:<http://purl.org/dc/elements/1.1/>
SELECT ?sword ?slang ?spos ?ssense ?twordRes ?tword ?tlang
FROM <http://wiktionary.dbpedia.org>
WHERE {
?swordRes terms:hasTranslation ?twordRes .
?swordRes rdfs:label ?sword .
?swordRes dc:language ?slang .
?swordRes terms:hasPoS ?spos .
OPTIONAL { ?swordRes terms:hasMeaning ?ssense . }
OPTIONAL {
?twordBaseRes terms:hasLangUsage ?twordRes .
?twordBaseRes rdfs:label ?tword .
}
OPTIONAL { ?twordRes dc:language ?tlang . }
}
First of all, you want to select all term senses that are nouns. As you can see in the query result of the example query, this information is captured by the
terms:hasPoS
relation. So, to specifically query all nouns, we could do this:Result
The next thing you want is only nouns of a certain language. This seems to be covered by the
dc:language
relation, so we add an additional constraint on that relation. Let's say we want all English nouns:Result
So, we are now selecting what you want, but we don't yet have the output in the format you want, as the above query just gives back the identifier of the term sense, not the string-value of the actual term. As we can see in the output from the example query, the string value is captured by the
rdfs:label
property, so we add that:Result
If you now look at this query's result you'll see that there is something odd with the language going on: despite the fact that we thought we selected English, we are also getting back labels that have a different language tag (e.g. '@ru'). To remove these results we can restrict our query further, and say that we only want back labels in English:
Result
Finally, the gender/genus. Here I'm not really sure. Looking at some example resources in the wiktionary data (for example, the entry for dog) I'd say this information is not actually present in the data.