Given three major competing applications each implementing a slightly different data schema for the same problem domain, I am faced with the task of implementing:
- a "canonical" data schema expressive enough to represent something like the intersection set of features of all 3 applications as well as additional details (meta data)
- converters for doing (bidirectional) data exchange between those 3 applications and the canonical schema
How I currently approach the task
The canonical schema is defined using XSD and closely resembles the data schema of one of the 3 applications, let's call it A. This renders data exchange with A trivial. In order to allow for a bidirectional data exchange with applications B and C (create some state in A, load it into B, alter it in B, load the altered state into A), I try to map simples states in A onto more complex states in B/C which can be identified and deconstructed in the reverse mapping.
Example: In A, objects can simply be "mirrored" as a intrinsic geometric transformation while in B and C, we have to introduce a "mirrored subspace" in which the respective object is embedded. This "mirrored subspace" is also available in A. Thus during conversion B->A, we have to decide whether a "mirrored subspace" found in the data has to be mapped onto a "mirrored subspace" in A or if it shall be replaced by an intrinsic geometric transformation of the object. I currently do this by specially labeling those "mirrored subspaces" which were only introduced during conversion A->B.
Why I want to change my approach
- Most of the schema mappings are pretty trivial (the name of an object in A simply maps to the name of the object in B), so I would like to avoid writing a lot of trivial code by hand. I imagine that this trivial code could be generated given a formalized mapping between the data schemes.
- For the nontrivial parts of the mapping (like the one desribed above), I expect lots of changes in the future simply because it seems so arbitrary. In many cases a specific convention for mapping states in A onto more complex states in B/C might run into a dead end at some point. For example it might become necessary for users to change the "mirrored subspace" label and therefore another approach for identifying conversion artifacts might be necessary. I imagine that a formalized mapping could be a tool to transparently manage those conventions. Maybe a reasoner could even automatically spot incoherent, inconsistent mappings. It might also allow me to more easily discuss the mapping with domain experts and users.
Questions
- From what I read about ontologies I have the impression that what I want is an ontology. Is this correct?
- As I understand it, using an ontology to describe the mapping would also require me to express the data schemes themselves in the ontology (so a relation "maps to" can reference a type from A and a type from B). Since those schemes are taken from long-lived applications, they are not always coherent. For example, a "feature" in the application might cause some state to have a different semantic that you would expect from the semantics of its constituents. Can existing tools help me with managing those complexities?
- I expect that I would require some additional machinery inside the ontology to describe something like -taken from the example above- the difference between a "permanent mirrored subspace" and a "dissipating mirrored subspace" (two types + a special relation reconnecting them?). Would this be much effort to do? Do available ontology languages provide something out-of-the-box to express this?
- Is this application of ontologies a common application for ontologies or is it a corner case? Do you know of companies who provide services for this application?
- Which tools would you suggest for creating the ontology? I assume there are no off-the-shelf tools available for the code generation mentioned. So how would you approach the code generation task?
Abstracting a type system, a set of interfaces, and rules for transformation is defined as subset of ontology creation known as a meta-model. The Meta Object Facility (MOF) is an example:
Topic Maps are another:
References
InfoGrid Web Graph Database: What are the differences between a vocabulary, a taxonomy, a thesaurus, an ontology, and a meta-model?
Meta Object Facility (MOF) Specification (pdf)
Topic Maps Now
An Introduction to Topic Maps
TM4J - Topic Maps For Java
Code Generation with OpenDDS, Part I :: OCI
Paws - A Perl SDK for AWS (Amazon Web Services) APIs - metacpan.org