What's the meaning of the characters in the JPEG binary byte stream opened in python?

31 views Asked by At

In the tutorials I've learned, the composition of JPEG files is only binary system. But when I use python to open a JPEG file, the content of the file is not as regular as tutorials. The content of JPEG file I hope to see is like:\xff\xd8\xff\xe0\x00\x10... But in fact, it is like:\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00\x00\x01\x00\x01\x00\x00\xff\xdb\x00C\x00\x08\x06\x06\x07\x06\x05\x08\x07\x07\x07\t\t Why there are characters like JFIF, C, \t\t and so on? enter image description here enter image description here

I hope I can figure it out, and modify this JPEG file on a small scale.

2

There are 2 answers

0
Martin Brown On

A valid JPEG file must begin with the Start of Image (SOI) marker 0xff, 0xd8 and must contain Huffman tables and quantisation tables as well as the compressed image data. There are several other optional things it can contain too - many JPEGs out of a camera will have a thumbnail embedded. A bare JPEG file doesn't need much header info but it absolutely has to begin with SOI.

In theory it should end with EOI too but only the strictest decoders are fussy about that.

The second item 0xff, 0xe0 is for application specific metadata which allows the program opening the file to know what flavour of JPEG it is dealing with - in this case JFIF. It specifies the JPEG File Interchange Format.

A full list of all the various JPEG markers is on Wiki

The two most common flavours of JPEG files encountered are [Exif] (https://en.wikipedia.org/wiki/Exif) 0xff, 0xe1 from most modern cameras and older JFIF.

Some can also include comments. There have been past threads here on SO about creating the smallest possible valid JPEG image file - using esoteric and rarely seen arithmetic encoding options.

It is an interesting programming exercise to parse the markers and embedded strings in a JPEG file. I suggest trying one from a NASA or HST site as they sometimes have interesting spare thumbnails lurking in them.

If you want more detail about the JPEG internals then Miano's book "Compressed Image File Formats" isn't a bad introduction and much more accessible than the JPEG standards document.

0
user3344003 On

It is not possible to implement a practicable compression/decompression system under the JPEG standard. The JPEG standard is sprawling in that it includes every academic's pet project who was on the committee. It is also incomplete. A glaring omission is JPEG standard says nothing about mapping colors.

In order to make JPEG usable, people had to come up with standards that (a) filled in the gaps in JPEG and (b) selected which parts of JPEG would actually be implemented.

JFIF was the first such standard and is the most commonly used. EXIF and Adobe are others that are widely used. Finally, the official SPIFF format came about but it is not commonly used.

You are seeing a Start of Image Marker followed by a JFIF APP0 marker. That is followed by a Define Quantization Table. The other characters are just binary numbers that have character equivalents.

The JPEG folks subsequently came out with the JPEG2000 standard that was even more academic than the original JPEG standard. There has never been a serious effort to make JPEG2000 implementable so it has never taken off.