How do you count the entities of a type that share values in Wikidata or subparts of Wikidata?

80 views Asked by At

I want to see how many entities in a selection share a property value with other entities. For example, how many paintings share the same 'depict' property value (P180)? My attempts with the following SPARQL query on WDQS (or on a local small subpart of Wikidata) often result in a timeout.

prefix wdt: <http://www.wikidata.org/prop/direct/>
prefix wd: <http://www.wikidata.org/entity/>
SELECT (count(distinct ?entity1) as ?c)
WHERE {
  ?entity1 wdt:P31 <http://www.wikidata.org/entity/Q3305213>; wdt:P180 ?val  .
  ?entity2 wdt:P31 <http://www.wikidata.org/entity/Q3305213>; wdt:P180 ?val .
  filter(?entity1!=?entity2)
}

Is there a better way of formulating the SPARQL query to get a result?

1

There are 1 answers

4
logi-kal On BEST ANSWER

Since you don't use the variable ?entity2 in your select statement, a more efficient query would be the following one:

SELECT (count(distinct ?entity1) as ?c)
WHERE {
  ?entity1 wdt:P31 wd:Q3305213;
           wdt:P180 ?val .
  filter exists {
    ?entity2 wdt:P31 wd:Q3305213;
             wdt:P180 ?val .
    filter(?entity1 != ?entity2)
  }
}

Unfortunately, this seems to run out of time too.

Alternatively, you can use the following query:

SELECT (count(distinct ?entity) as ?countEntity)
WHERE {
  {
    SELECT ?val (count(distinct ?entity) as ?countVal)
    WHERE {
      ?entity wdt:P31 wd:Q3305213 ;
              wdt:P180 ?val .
      hint:SubQuery hint:runOnce true .
    }
    GROUP BY ?val
    HAVING (?countVal > 1)
  }
  hint:Prior hint:runFirst true .
  ?entity wdt:P31 wd:Q3305213 ;
          wdt:P180 ?val .
}

which it runs in about 35 seconds.

Intuitively, the inner query retrieves all the values ?val shared by at least two paintings (by checking that ?countVal > 1); then, the outer query counts the entities having at least one of such values.