When I import the dump "PathwayCommons12.All.BIOPAX.owl.gz" (linked from this page) of this Virtuoso triplestore, I've noticed that there are "#"s inserted after the prefix of various URIs.
In particular, the following query runs on the original endpoint:
# Query 1
PREFIX pfx: <http://pathwaycommons.org/pc12/>
select ?pw
where {
?pw a bp:Pathway
values ?pw {pfx:Pathway_c2fd3d95c8c65552a0514393ede60c37}
}
But to get it running on the local endpoint (imported owl dump) I have to add a "#" to the end of pfx:
like:
# Query 2
PREFIX pfx: <http://pathwaycommons.org/pc12/#>
select ?pw
where {
?pw a bp:Pathway
values ?pw {pfx:Pathway_c2fd3d95c8c65552a0514393ede60c37}
}
Note that Query 1 works only on the original endpoint, while Query 2 works only on the local endpoint.
What is going on here?
If we look at the first few lines of that massive RDF/XML file, we see:
Note the value of the
rdf:ID
attribute here: "ExperimentalForm_ee10aeab-1129-49ad-8217-4193f4fbf7e0". This is a relative URI, and needs to be resolved against the base URI (which is declared in the document header: "http://pathwaycommons.org/pc12/"). How this resolution is supposed to happen is described in section 2.14 of the RDF/XML syntax specifcation:(emphasis mine)
Example 16 in the specification illustrates this further.
What it comes down to is that in parsing this RDF/XML, the values supplied as
rdf:ID
attributes all resolve tohttp://pathwaycommons.org/pc12/#<ID>
. So the result you're getting in GraphDB is correct for the given input. Why it is different in the Virtuoso endpoint I don't know: either they used a different input file, or they have a bug in their parser, or whatever tool was used to produce this dump file contains a bug.It is probably safe to say that the intent of whoever created the dump file was that
rdf:ID="ExperimentalForm_ee10aeab-1129-49ad-8217-4193f4fbf7e0"
would resolve to the IRIhttp://pathwaycommons.org/pc12/ExperimentalForm_ee10aeab-1129-49ad-8217-4193f4fbf7e0
(that, is without the added#
character). There are several ways to fix this in the file: either replace all occurrences ofrdf:ID
withrdf:about
, or else don't rely on relative URI resolution and just use the full URI as therdf:ID
value.