Choosing a magic byte least likely to appear in real data

Question

Choosing a magic byte least likely to appear in real data

373 views Asked by jberryman At 21 November 2014 at 23:12

I hope this isn't too opinionated for SO; it may not have a good answer.

In a portion of a library I'm writing, I have a byte array that gets populated with values supplied by the user. These values might be of type Float, Double, Int (of different sizes), etc. with binary representations you might expect from C, say. This is all we can say about the values.

I have an opportunity for an optimization: I can initialize my byte array with the byte MAGIC, and then whenever no byte of the user-supplied value is equal to MAGIC I can take a fast path, otherwise I need to take the slow path.

So my question is: what is a principled way to go about choosing my magic byte, such that it will be reasonably likely not to appear in the (variously-encoded and distributed) data I receive?

Part of my question, I suppose, is whether there's something like a Benford's law that can tell me something about the distribution of bytes in many sorts of data.

Original Q&A

There are 2 answers

grtamayo On 22 February 2022 at 10:57

I was inclined to use byte 255 but I discovered that is also prevalent in MSWord files. So I use byte 254 now, for EOF code to terminate a file.

**Potatoswatter** · Accepted Answer · 2014-11-21T23:16:30+00:00

Capture real-world data from a diverse set of inputs that would be used by applications of your library.

Write a quick and dirty program to analyze dataset. It sounds like what you want to know is which bytes are most frequently totally excluded. So the output of the program would say, for each byte value, how many inputs do not contain it.

This is not the same as least frequent byte. In data analysis you need to be careful to mind exactly what you're measuring!

Use the analysis to define your architecture. If no byte never appears, you can abandon the optimization entirely.

TechQA.

Choosing a magic byte least likely to appear in real data

There are 2 answers

Related Questions in LANGUAGE-AGNOSTIC

Related Questions in BYTE

Related Questions in PROBABILITY

Related Questions in MAGIC-NUMBERS

Popular Questions

Popular Tags

Trending Questions