Finding common categories or supercategories of resources

499 views Asked by At

I'm wondering if we can know whether two resources have the same category or some subcategory (i.e., belong to categories of some common supercategory) in DBpedia? I tried this query in the DBpedia endpoint but it's wrong:

select distinct ?s ?s2 where {
?s skos:subject <http :// dbpedia.org/resource/ Category ?c.
?s2 skos:subject <http :// dbpedia.org/resource/ Category ?c2.
?c=?c2.
}
1

There are 1 answers

3
Joshua Taylor On

DBpedia doesn't use skos:subject for resources, but rather relates resources to their Wikipedia categories using dcterms:subject. You can find out what data is available by browsing the resource pages. E.g., you might have a look at http://dbpedia.org/resource/Mount_Monadnock. If you want to find categories that two resources have in common, just use the same variable. E.g.,

?subject1 dcterms:subject ?category .
?subject2 dcterms:subject ?category .

You can write that more concisely with the ^property notation and object lists. Writing o ^p s is the same as writing s p o. Object lists let you write s p o1, o2 instead of s p o1. s p o2.. Putting these together, we can write:

?category ^dcterms:subject ?subject1, ?subject2 .

E.g., here's a query that finds common categories of Mount Monadnock and Spofford Lake. There's just one result, Landforms of Cheshire County, New Hampshire, since they only have one category in common.

select * where {
  ?category ^dcterms:subject dbpedia:Mount_Monadnock, dbpedia:Spofford_Lake .
}

SPARQL results

Now, categories are related to their supercategories in DBpedia by skos:broader, as you can see in http://dbpedia.org/page/Category:Landforms_of_Cheshire_County,_New_Hampshire, where there are links to

Now, this means that if two things have have some common category (or supercategory), each will be related to that category by a path starting with a dcterms:subject link and followed by zero or more skos:broader links. Thus, you could use a query like

select * where {
  ?category ^(dcterms:subject/skos:broader*) dbpedia:Mount_Monadnock, dbpedia:Spofford_Lake .
}

You'll find, unfortunately, that the DBpedia endpoint runs into memory usage problems with that query, so you can't run it exactly like that. However, the DBpedia SPARQL endpoint supports a property path feature that actually didn't make it into the standard; you can write p{n,m} to denote a chain of length at least n and at most m. This means you can put some ranges on that will get you most of the same results as *:

select distinct ?category where {
  ?category ^(dcterms:subject/(skos:broader{0,3})) dbpedia:Mount_Monadnock, dbpedia:Spofford_Lake .
}

SPARQL results

This works with Tom Cruise and Madonna as well, though you'll need to scale back the path length a bit because of the memory issues. For instance, the following query returns seventy-four results.

select distinct ?category where {
  ?category
      ^(dcterms:subject/(skos:broader{0,2}))
          <http://dbpedia.org/resource/Tom_Cruise>,
          <http://dbpedia.org/resource/Madonna_(entertainer)> .
}

SPARQL results

It's worth noting, though, that Wikipedia categories aren't types. So while both of those resources are rightly considered to be landforms, neither is a geography or, as you'll see in the later query, New Hampshire. Wikipedia categories are much more about topic than a type hierarchy.

Related reading

There's a related (but not quite duplicate question) that you might find helpful as well: Using SPARQL to locate a subject with multiple occurrences of same property.