What are the pros and cons of RDB2RDF tools?

1.8k views Asked by At

I need to know the difference between RDB2RDF tools. Could anybody tell me what are the pros and cons of RDB2RDF tools? Especially for the following ones: Virtuoso, Ultrawrap, Ontop, Morph, Xsparql, D2RQ,....

2

There are 2 answers

3
Stanislav Kralin On

There are two W3C-standardized ways to convert relational data to RDF:

  1. Direct Mapping — non-customizable default mapping. Direct Mapping is suitable when relational data is well normalized, there are primary keys, foreign keys etc.
  2. R2RML — customizable mapping.

In the survey below, I consider R2RML implementations only.

Many R2RML implementations are listed here. I do not consider tools that are:

  • dead,
  • paid,
  • requiring programming,
  • full-stack (i. e. claim to replace all the software you already use),
  • working in the wrapper mode only, not in the ETL mode.

XSPARQL

Syntax example

java -jar cli-0.5-jar-with-dependencies.jar -h
java -server -jar -Dfile.encoding=utf-8 cli-0.5-jar-with-dependencies.jar --mysql --dbName=mydb --dbServer=127.0.0.1 --dbUser=root --r2rml=r2rml.ttl > result.ttl

Remarks

  • cli-0.5-jar-with-dependencies.jarcommand-line jar.
    Version 0.5 is preferable, you will recieve "Prefix cannot be null" in the latter ones.

Conclusion

Intermediate translation into XQuery is used, very slow.

ONTOP

Ontop is a popular Protégé plugin, but also available as a set of command line utilities.

Syntax example

ontop materialize --url "jdbc:mysql://localhost:3306/mydb" --mapping "../r2rml.ttl" --username root --password "65536" --driver-class com.mysql.jdbc.Driver --disable-reasoning --format turtle --output result.ttl

Remarks

  • In MySQL, you have to set SET GLOBAL SQL_MODE-ANSI_QUOTES;

Conclusion

Ontop was designed for working with ontologies and generates many ontological garbage like ... rdf:type owl:namedIndividual.

Ontop tries to parse and rewrite an SQL query from rr:sqlQuery, does not understand many SQL constructs and honestly suggests you to create appropriate SQL view in your relational database.

R2RML support is partial. Ontop R2RML manual. Really fast.

RDB2RDF::R2RML

I haven't been able to install this Perl module: there are many dependencies that are absent on CPAN.

D2RQ

D2RQ is a full-stack solution, however one can extract standalone tool from the D2RQ distribution.

R2RML is supported in the preview version only.

D2RQ provides its own mapping language (by the way, as well as Ontop).

Conclusion

As well as I remember, D2RQ divides your SQL query from rr:sqlQuery into many "atomic" queries and extracts database records one by one, which is really slow.

D2RQ R2RML Manual.

CONCLUSION

My personal choice is Ontop.

See also:

3
Mark Miller On

I haven't thought about this as rigorously as @Stanislav Kralin, or defined what I expect in terms of performance, elegance, expressiveness, etc.

More and more of the triplestores offer their own bridge between relational data and semantic triples. I'm thinking especially of Stardog and GraphDB. I believe that Stardog (and Virtuoso's?) solutions don't actually concretely dump triples. Rather they create a virtual semantic view of one or more tables.

D2R was the first instantiator I used. I'm surprised @Stanislav Kralin included it, because it is kinda dead (or un-maintained) and it does kinda require programming (or writing out statements in a declarative language.) I didn't know about the R2RML preview... I'll have to check that out, because I was concerned about using their proprietary language.

I believe some of my academic colleagues use the reference R2RML parser.

I have been pretty happy with Karma from ISI. Instantiating tabular/relational data is a big part of my research, and I have certainly found some edge cases that have been difficult to implement, for example linking multiple singleton instances.

  • The documentation is good
  • installation is easy
  • there's a nice web GUI, plus a command line bulk transformation script

Karma doesn't use just pure R2RML:

  • They use R2RML
    • with JSON worksheets as the object of at least one triple
      • with Python data transformations in the JSON