XML Schema for scientific instrumentation time-series logging

140 views Asked by At

Q:

I know there is not one perfect answer to all of this nonsense; I am hoping for some experienced insight to narrow down the possible flavors, some general strategy to avoid conversion nightmares, and any ideas on reducing my data-storage footprint on the CPU/disk (large string operations are expensive and tedious). I am on restricted hardware, and somewhat new to XML standards. I can read and write it just fine (usually for website), never really as a dataset encapsulation.


I have given this weeks of thought, and I am 92.3% sure that XML files are my ideal storage destination. I am logging various instrumentation readings/analysis, and holding it for months at a time. Although I do have concerns about my data-collection nodes having limited hardware resources (Excessive string operations can get slow, 512kB RAM, 3.2GB flash storage).

I am trying to find a well formed ML with a minimal footprint that can handle RAW numerical datatypes. I do not need fully compliant files, BUT I am looking for a best-fit solution, so lets not deviate too far from proper form

Primary Data Model Factors

(and why I think XML is a better fit that Packed binary, FLAT TEXT, or even CSV)

  • Up to 8 different datapoints (different measurements, brands, and sensor types)
  • various raw datatypes (REAL32, DINT, DWORD, BYTE, STRING(arbitrarily long)
  • datasets need to be able to keep absolute timestamps within each file (I have a directory full of 100's of XML's that will eventually merge)
  • datapoint configuration/quantity could change, so I need to be able to note alterations to the schema with minimal verbosity/confusion.

Performance Constraints/Considerations

  • I should normally only write out the XML from the embedded platform, so readability is not paramount, although if I do need to handle any kind of inquiry, tossing and parsing 3.0GB of text is not going to be fun even at its very cleanest.
    • I believe that intermittent DATE-TIME nodes will help me index such an inquiry
  • Compressing data excessively can actually become a problem at export time, because those become yet more calculations to unzip my laziness.
  • Excessively verbose XML only gives me 111 days of storage. I would like to get that up to 180 days or longer. So I do need to condense text better.
  • There are 3 potential targets once the data is offloaded. I don't want to run into conversion bottlenecks/mistakes by over-complicating.
    • Microsoft Excel (he doesn't have to understand it perfectly, but we don't want to spend hours manually importing non-compliant schema types/maps into a 2D grid.
    • RRD Backend Server (I will be able to run any conversions needed, but hopefully I am already close to what RRD wants
    • Some cute Javascript/Android tools. Although I expect these to perform custom datatype handling, well-formed XML will make retrieval and parsing simpler during development.
1

There are 1 answers

0
Bert Schultheiss On

Did you consider storing your XML files in an XML database such as eXist?