I'm using a federated query to retrieve some infos from a remote server, but I don't want to retrieve all the variables (select *) that I'm working on inside the federated query, I want to return just the count variable. How can I do that?
Code:
SERVICE <https://sparql.uniprot.org/sparql/> {
?sub_bp (rdfs:subClassOf|owl:someValuesFrom)* ?bp_iri .
?protein up:classifiedWith ?sub_bp.
?protein up:organism <http://purl.uniprot.org/taxonomy/10090> .
}
If was not a federated query, I would do like this:
SELECT distinct (count(distinct ?protein) as ?count) WHERE {
?sub_bp (rdfs:subClassOf|owl:someValuesFrom)* ?bp_iri .
?protein up:classifiedWith ?sub_bp.
?protein up:organism <http://purl.uniprot.org/taxonomy/10090> .
}
But in the federated query I cannot select variables, so is there a way to do what I want?
** EDIT 1 **
After @TallTed response I notice that I may have skipped some details in order to make the question simple but the details turn out to be important so I will describe the whole situation.
I have a local data set containing triples about biological process and genes. I have to count how many genes are related to each biological process and divide that number by the total number of proteins identified in Uniprot about the same biological process (and its "childrens").
To do this, I first query my local data set counting the genes for each biological process and then I run a federated query to count all the identified proteins in Uniprot of each biological process (and its "childrens").
The full SPARQL code:
PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX uniprot: <http://purl.uniprot.org/core/>
PREFIX up:<http://purl.uniprot.org/core/>
PREFIX owl:<http://www.w3.org/2002/07/owl#>
SELECT DISTINCT ?bp_iri ?bp_count (count(distinct ?protein) as ?bp_total) ((?bp_count / ?bp_total) as ?divided) WHERE {
{
SELECT DISTINCT ?bp_iri (COUNT(?bp_iri) as ?bp_count) WHERE{
?genes_iri a uniprot:Gene .
?genes_iri obo:RO_0000056 ?bp_iri .
}group by ?bp_iri order by DESC(?bp_count)
}
SERVICE silent <https://sparql.uniprot.org/sparql/> {
?sub_bp (rdfs:subClassOf|owl:someValuesFrom)* ?bp_iri .
?protein up:classifiedWith ?sub_bp.
?protein up:organism <http://purl.uniprot.org/taxonomy/10090> .
}
}group by ?bp_iri ?bp_count ?bp_total order by DESC(?divided)
When I run this query using Jena ARQ (a query engine) the variable ?bp_iri
is replaced at the moment of the HTTP request by an specific biological process IRI (one HTTP request for each biological process) as shown in the image below:
Note that in the explain
image, the federated query is selecting everything (*) but the problem is that I don't want to retrieve all these relations that I'm dealing in the federated query, I just want to retrieve the count but the count is a aggragated function that is only allowed to be placed in front of the SELECT
keyword. (I don't want to retrieve all the relations because these query returns A LOT of triples (in order of tens of thousands, sometimes milions) and its not necessary to have them in my computer just to count.)
To solve this, I tried to create a subquery inside the federated query to select only the count (?bp_total
) and not all the triples. Code used:
SERVICE silent <https://sparql.uniprot.org/sparql/> {
{
SELECT (count(distinct ?protein) as ?bp_total) WHERE {
?sub_bp (rdfs:subClassOf|owl:someValuesFrom)* ?bp_iri .
?protein up:classifiedWith ?sub_bp.
?protein up:organism <http://purl.uniprot.org/taxonomy/10090> .
}
}
}
Running the explain
again, I noticed that when I put a subquery inside the federated query, the variable ?bp_iri
is not replaced by the biological process IRI as shown in the image below:
Considering this, how can I retrieve only the count from a federated query?
Sorry about the long post.
As in Using Wikidata label service in federated queries, include some of the things that are nominally optional...
Note -- your remote query must actually execute on the remote endpoint, else you will get varying errors.
This is the query you're trying to run on the Uniprot endpoint --
That gets an error --
-- but that's not due to a syntax error; it's due to the ZeroOrMorePath of
rdfs:subClassOf
orowl:someValuesFrom
properties ((rdfs:subClassOf|owl:someValuesFrom)*
) Property Path you're querying, which has to try MANY possibilities.If you limit the depth of that path, the Uniprot end point can handle it, and you can run it through Federated SPARQL.
Here's a reduced depth query (which I arbitrarily tried with 3 "ZeroOrOnePath") --
-- that got a result --
-- which I found was the same result down to a single level --
I just ran this query through URIBurner.com (which permits Federated SPARQL for authenticated users) --
That still produces an error --
-- which suggests different settings are in play on the Uniprot server when you go directly through their web query form, which uses JDBC against their SPARQL server, then when you go straight through HTTP, as with Federated SPARQL.
I think the solution you need is a local Uniprot mirror, or a connection to the public Uniprot instance that has different permissions/settings than the primary public endpoint.