BOM (byte order mark) of ISO Encoding

846 views Asked by At

is there a BOM of ISO-8859-1 and ISO-8859-2 encoding?

1

There are 1 answers

1
Giacomo Catenazzi On BEST ANSWER

No. There is no need of BOM (Byte-Order-Mark) for a encoding where every (with exceptions) characters are one bytes. BOM is used to determine which byte order have 16-bits (or 32-bits) numbers: various processors uses different convention, and different protocols also: internet (IP) uses different order as the common Intel processors (and so common operating systems).

Note: one large company (Microsoft) is known to break standards just for own advantage, and so it started to put unnecessary (and often wrong) BOM also to UTF-8. (UTF-8 may use BOM on few specific circumstances). Do not fall into the trap. Unix, Linux, and Apple were able to go to UTF-8 with few disruption.

The encoding information should be put off-band (e.g. specified by protocol). There is no other way. And on old 8-bit charset, there is no room to include such information (256 characters are already not enough). Python and some editors will look at signature (a line of text) at beginning or at end of a file, but it is ugly outside source code), and not all editors uses such information.

Else, the usual method: try to decode it as UTF-8 (if there are not 00 bytes, in such case, check UTF-16 and UTF-32), if you have errors, try with Latin-1 or others (you need a dictionary of common words in many language). In any case, there is a lot of heuristics (so: "guesses"), and one is never sure about encoding (just on large text made for humans: the probability to guess is high).