I have a great deal of RDF data to be inferred, and I need to develop my own inference rules. My question is that whether there is any method to do this? Is it ok to use Jena rule and SPARQL to do so? Do the Jena rule and sparql query have to load all data into memory? Wish to get answers soon and thanks in advance!
Any way to inference with Jena without load all data into memory?
2k views Asked by Wang Ruiqi At
2
There are 2 answers
2
On
In addition to what Ian said and depending on your rules, if materializing all inferred triples is feasible in a streaming fashion in your case, have a look at RIOT's infer source code and, if you need more than RDFS, think how you might add support for a subset of OWL. You find the source code here:
- https://svn.apache.org/repos/asf/jena/trunk/jena-arq/src/main/java/riotcmd/infer.java
- https://svn.apache.org/repos/asf/jena/trunk/jena-arq/src/main/java/org/openjena/riot/pipeline/inf/InferenceSetupRDFS.java
- https://svn.apache.org/repos/asf/jena/trunk/jena-arq/src/main/java/org/openjena/riot/pipeline/inf/InferenceProcessorRDFS.java
The approach of RIOT's infer command can also be used with MapReduce, you can find an example here:
The Jena inference engines definitely work best when all of the data is loaded into memory. This is because, while it is possible to connect an inference engine to any Jena
Model
, including a persistent store such as TDB, the inference algorithms make many calls to check the presence or absence of a particular triple in the model. This just gets inefficient when that check requires hitting disk.If you have relatively straightforward inference requirements, you may be able to express your entailments via carefully constructed SPARQL queries, in which case you can probably get away with querying a TDB or SDB store directly. It just depends on the complexity of the query.
If the contents of your triple store are reasonably stable, or can be partitioned into a stable, persistent set and a dynamic in-memory set, then a strategy is to pre-compute the inference closure and store that in a persistent store. The classic space/time trade-off, in other words. There are two ways to do this: first, use the inference engine with an in-memory store using as much heap space as you can give it; second use Jena's RIOT infer command-line script.
For large-scale RDF inferencing, an alternative to the open-source Jena platform is the commercial product Stardog from Clark & Parsia (which I believe has a Jena model connector, but I haven't used it myself).