Can and should an ontology be used to generate code for data converters?

191 views Asked by At

Given three major competing applications each implementing a slightly different data schema for the same problem domain, I am faced with the task of implementing:

  1. a "canonical" data schema expressive enough to represent something like the intersection set of features of all 3 applications as well as additional details (meta data)
  2. converters for doing (bidirectional) data exchange between those 3 applications and the canonical schema

How I currently approach the task

The canonical schema is defined using XSD and closely resembles the data schema of one of the 3 applications, let's call it A. This renders data exchange with A trivial. In order to allow for a bidirectional data exchange with applications B and C (create some state in A, load it into B, alter it in B, load the altered state into A), I try to map simples states in A onto more complex states in B/C which can be identified and deconstructed in the reverse mapping.

Example: In A, objects can simply be "mirrored" as a intrinsic geometric transformation while in B and C, we have to introduce a "mirrored subspace" in which the respective object is embedded. This "mirrored subspace" is also available in A. Thus during conversion B->A, we have to decide whether a "mirrored subspace" found in the data has to be mapped onto a "mirrored subspace" in A or if it shall be replaced by an intrinsic geometric transformation of the object. I currently do this by specially labeling those "mirrored subspaces" which were only introduced during conversion A->B.

Why I want to change my approach

  • Most of the schema mappings are pretty trivial (the name of an object in A simply maps to the name of the object in B), so I would like to avoid writing a lot of trivial code by hand. I imagine that this trivial code could be generated given a formalized mapping between the data schemes.
  • For the nontrivial parts of the mapping (like the one desribed above), I expect lots of changes in the future simply because it seems so arbitrary. In many cases a specific convention for mapping states in A onto more complex states in B/C might run into a dead end at some point. For example it might become necessary for users to change the "mirrored subspace" label and therefore another approach for identifying conversion artifacts might be necessary. I imagine that a formalized mapping could be a tool to transparently manage those conventions. Maybe a reasoner could even automatically spot incoherent, inconsistent mappings. It might also allow me to more easily discuss the mapping with domain experts and users.

Questions

  • From what I read about ontologies I have the impression that what I want is an ontology. Is this correct?
  • As I understand it, using an ontology to describe the mapping would also require me to express the data schemes themselves in the ontology (so a relation "maps to" can reference a type from A and a type from B). Since those schemes are taken from long-lived applications, they are not always coherent. For example, a "feature" in the application might cause some state to have a different semantic that you would expect from the semantics of its constituents. Can existing tools help me with managing those complexities?
  • I expect that I would require some additional machinery inside the ontology to describe something like -taken from the example above- the difference between a "permanent mirrored subspace" and a "dissipating mirrored subspace" (two types + a special relation reconnecting them?). Would this be much effort to do? Do available ontology languages provide something out-of-the-box to express this?
  • Is this application of ontologies a common application for ontologies or is it a corner case? Do you know of companies who provide services for this application?
  • Which tools would you suggest for creating the ontology? I assume there are no off-the-shelf tools available for the code generation mentioned. So how would you approach the code generation task?
1

There are 1 answers

0
Paul Sweatte On

Abstracting a type system, a set of interfaces, and rules for transformation is defined as subset of ontology creation known as a meta-model. The Meta Object Facility (MOF) is an example:

Meta-Object Facility (MOF) Diagram

The MOF interfaces and the MOF Model can be used to define specific metamodels for database, data warehouse, model transformation, and warehouse management domains

The MOF's IDL translation capabilities map a single Class onto two interfaces. It would be possible to define transformations to alternate interface representations, such as Java's interfaces.

There are, and probably always will be, divergent views among industry leaders on the definition of the concepts of Class, Type, and Interface. As a domain-specific modeling environment, so long as the MOF is clear about the meaning of Class within the MOF, it should remain immune from such concerns.

Topic Maps are another:

Extracting knowledge into an independent layer, and enabling processing at that knowledge layer, with a feedback loop going back to the sources, doesn't seem like the most direct and efficient way to do this. This process is comparable to the publishing workflow where authors insist on using Word, but the publishers want XML. Round-tripping the conversion between the two formats is not efficient, but it's sometimes necessary. Furthermore, this level of indirection is precisely what provides us with the power and freedom to handle knowledge in a way that can be preserved over time, regardless what happens to the source information, and more specifically, to the systems used to handle. All the work which was done to describe information, type the topics, create relationships, manage multilingual equivalences, still works. Because it has been managed independently, upgrading a system simply mean disconnect from the old system and reconnect to the new one.

The lessons learned from working with Topic Maps for more than two decades are contrasted: because the rapid pace of technological advances, we have been overwhelmed by the success of information technologies. Looking for the immediate next big thing has obscured our capacity of thinking about the fundamental nature of what we are doing. The notions of trust, reliability, high quality content, are still central to the long-term success of our enterprises. We need to adjust to the changing nature of the ways information we are dealing with presents itself. It's just the beginning. When we created the Topic maps standard, we created something that turned out to be a solution without a problem: the possibility to merge knowledge networks across organizations. Despite numerous expectations and many efforts in that direction, this didn't prove to meet enough demands from users. But we also developed the concept of independence between information sources and the knowledge management layers. This may turn out to be what remains on the long term, even if the fact that this idea once went by the name of topic maps may fall into oblivion.

References