DBPedia queries missing certain chemical compounds

169 views Asked by At

I am running this query to get list of all compounds from the DBPedia public SPARQL endpoint.

SELECT * WHERE {
  ?y rdf:type dbpedia-owl:Drug.
  ?y rdfs:label ?Name .
  OPTIONAL {?y dbpedia-owl:iupacName ?iupacname} .
  OPTIONAL {?y dcterms:subject ?y1}
  FILTER (langMatches(lang(?Name),"en"))
}
LIMIT 50000

I am downloading in batches of 50000 (2 files) using offset parameter.

Somehow Isopropyl_alcohol is not getting covered in this even where page exists at

and it has the properties that I am searching for?

2

There are 2 answers

0
Joshua Taylor On BEST ANSWER

There are two issues here. The first is that DBpedia Live and DBpedia do not have exactly the same content. According to the DBpedia live webpage

Wikipedia users constantly revise Wikipedia articles with updates happening almost each second. Hence, data stored in the official DBpedia endpoint can quickly become outdated, and Wikipedia articles need to be re-extracted. DBpedia Live enables such a continuous synchronization between DBpedia and Wikipedia.

That page also lists two SPARQL endpoints for DBpedia Live:

However, you'll run into issues on both. Isopropyl_alcohol is in DBpedia, and its URI is

Looking there, we see that Isopropyl alcohol doesn't have rdf:type dbpedia-owl:Drug, but only

so you won't be able to find it with your query on DBpedia, because it doesn't have the type `dbpedia-owl:Drug. Now, Isopropyl_alcohol also exists in DBpedia live, and its URL is

but it only has the folllowing rdf:types:

so it won't be found by your query on DBpedia Live, for the same reason.

The second issue is the one that AndyS pointed out. Even if the query would select Isopropyl_alcohol in DBpedia or DBpedia Live, unless you provide an ordering constraint, the limit/offset combination won't be guaranteed to return it, since without an ordering constraint, the server could legitimately return the same set of 50000 results to you every time.

1
AndyS On

Maybe it is not finding it in the LIMIT/OFFSET combination you are using. The server is not obliged to answer queries in the same order everytime unless you use ORDER BY so maybe the slices you have are not in fact all results.

Maybe the SPARQL site and live.dbpedia are not in step.

Try asking directly for Isopropyl_alcohol.