I'm looking for a fast / lossless / fix spaced compression algorithm for the following task.
I have an embedded system. Low memory / flash resource.
I want to generate a core dump for it and store the result in flash and / or suck it out over a slow serial line.
All I need essentially is the heap, stack, .data and .bss segments and a few memory mapped registers.
The device is a Leon Sparc Softcore.
Now this data as a couple of oddities which indicate that
- The usual LZW / Zlib / ... compression libraries won't do as well. (Not a natural language corpus).
- Sparcs are obsessive about alignment. ie. I can guarantee that every item in the .bss and .data segments is either (effectively) an 8, 16, 32 or 64 bit int aligned correspondingly at 8,16,32,64 bit address boundaries.
- I'd have to reserve a small fixed space for the compression code.
- It's a 32 bit machine with the size of the bss and data segments very much smaller than 4gb.
- Most 32 bit values in the system are addresses of symbols. ie. A very very small subset of the 4 gig possible values.
My current plan is to... Scan the entire .bss / .data segments and compute a complete histogram for 16 bit values.
From this I can get a perfect huffman encoding. But I suspect I can do a lot better contemplating the internal structure of 32 bit / 64 bit values. (eg. The variability in the most significant half of bits is much much much less than the variability of the least significant bits.)
And pointers / suggestions / existing work?