Q:
I know there is not one perfect answer to all of this nonsense; I am hoping for some experienced insight to narrow down the possible flavors, some general strategy to avoid conversion nightmares, and any ideas on reducing my data-storage footprint on the CPU/disk (large string operations are expensive and tedious). I am on restricted hardware, and somewhat new to XML standards. I can read and write it just fine (usually for website), never really as a dataset encapsulation.
I have given this weeks of thought, and I am 92.3% sure that XML files are my ideal storage destination. I am logging various instrumentation readings/analysis, and holding it for months at a time. Although I do have concerns about my data-collection nodes having limited hardware resources (Excessive string operations can get slow, 512kB RAM, 3.2GB flash storage).
I am trying to find a well formed ML with a minimal footprint that can handle RAW numerical datatypes. I do not need fully compliant files, BUT I am looking for a best-fit solution, so lets not deviate too far from proper form
Primary Data Model Factors
(and why I think XML is a better fit that Packed binary, FLAT TEXT, or even CSV)
- Up to 8 different datapoints (different measurements, brands, and sensor types)
- various raw datatypes (REAL32, DINT, DWORD, BYTE, STRING(arbitrarily long)
- datasets need to be able to keep absolute timestamps within each file (I have a directory full of 100's of XML's that will eventually merge)
- datapoint configuration/quantity could change, so I need to be able to note alterations to the schema with minimal verbosity/confusion.
Performance Constraints/Considerations
- I should normally only write out the XML from the embedded platform, so readability is not paramount, although if I do need to handle any kind of inquiry, tossing and parsing 3.0GB of text is not going to be fun even at its very cleanest.
- I believe that intermittent DATE-TIME nodes will help me index such an inquiry
- Compressing data excessively can actually become a problem at export time, because those become yet more calculations to unzip my laziness.
- Excessively verbose XML only gives me 111 days of storage. I would like to get that up to 180 days or longer. So I do need to condense text better.
- There are 3 potential targets once the data is offloaded. I don't want to run into conversion bottlenecks/mistakes by over-complicating.
- Microsoft Excel (he doesn't have to understand it perfectly, but we don't want to spend hours manually importing non-compliant schema types/maps into a 2D grid.
- RRD Backend Server (I will be able to run any conversions needed, but hopefully I am already close to what RRD wants
- Some cute Javascript/Android tools. Although I expect these to perform custom datatype handling, well-formed XML will make retrieval and parsing simpler during development.
Did you consider storing your XML files in an XML database such as eXist?