I'm engaging in a project that stores 2 RDF Data Cubes:
- Climate Data Cube : humidity-dataset, rainfall-dataset, temperature-dataset
- Industry Data Cube : industry-dataset Both data cubes are stored on GraphDB Database as named graphs. Each dataset of these graphs both have the same dimension: time and year. Now I need to merge these dataset together for data-exploring. Assume we the observations below that contain the data of climate and industry of Ha Noi city in 2016-2017:
graph : http://sda-research.ml/graph/climate
Dataset-climate
ds:obs5 a qb:Observation;
qb:dataSet ds:dataset-climate;
prop:city "Ha Noi"@en;
prop:cityid "hanoi";
prop:humidity 8.17E1;
prop:rainfall 2.1668E3;
prop:year "2016"^^xsd:int .
ds:obs6 a qb:Observation;
qb:dataSet ds:dataset-climate;
prop:city "Ha Noi"@en;
prop:cityid "hanoi";
prop:humidity 8.18E1;
prop:rainfall 2.6402E3;
prop:year "2017"^^xsd:int .
graph : http://sda-research.ml/graph/industry
Dataset-industry
ds:obs205 a qb:Observation;
qb:dataSet ds:dataset-industry;
prop:city "Hà Nội"@en;
prop:cityid "hanoi";
prop:industry 1.073E2;
prop:year "2016"^^xsd:int .
ds:obs206 a qb:Observation;
qb:dataSet ds:dataset-industry;
prop:city "Hà Nội"@en;
prop:cityid "hanoi";
prop:industry 1.07E2;
prop:year "2017"^^xsd:int .
Now I want to merge 2 graphs for the output that contain humidity and industry value of Hanoi in 2016-2017. On GraphDB SPARQL Endpoint, I used this query:
PREFIX qb: <http://purl.org/linked-data/cube#>
PREFIX prop: <http://www.sda-research.ml/dc/prop/>
select ?city ?year ?temperature ?industry
where{
{graph ?g {
?obs a qb:Observation.
?obs prop:cityid ?cityid filter regex(?cityid, 'hanoi').
?obs prop:city ?city.
?obs prop:year ?year filter(?year >= 2017 && ?year <= 2018 ).
?obs prop:temperature ?temperature.
}
}
UNION
{graph ?g {
?obs a qb:Observation.
?obs prop:cityid ?cityid filter regex(?cityid, 'hanoi').
?obs prop:city ?city.
?obs prop:year ?year filter(?year >= 2016 && ?year <= 2017).
?obs prop:industry ?industry.
}
}
}
Expected output:
city------year------humidity------industry---
Ha Noi-----2016-------8.17E1------ 1.073E2---
Ha Noi-----2017-------8.18E1-------1.07E2----
Actual output:
city------year------humidity------industry--
Ha Noi-----2016-------8.17E1--------null----
Ha Noi-----2017-------8.18E1--------null----
Ha Noi-----2016--------null--------1.073E2--
Ha Noi-----2017--------null--------1.07E2---
How can I remove the null value when using UNION, or do you have any query that give the correctly expected result?
There are several issues with your query before we get into the SPARQL itself.
Now in terms of SPARQL issues.
?cityid
and?city
, but the value of?city
is spelt differently across named graphs, namely"Hà Nội"@en
and"Ha Noi"@en
.?g
for your named graphs. This means that the 2/4 results are obtained by looking at the climate graph, whereas the second two results by looking at the industry graph. When you have a specific graph in mind from which to extract sources, you should specify it.REGEX
. Different triplestores implement query planning differently, but this is an expensive operation that may significantly worsen your performance. See below for how to deal with this by using thevalues
keyword.Now here is a slightly amended query that produces the results you're after: