Determining ISO-8859-1 vs US-ASCII charset

Question

Determining ISO-8859-1 vs US-ASCII charset

15.2k views Asked by vikingsteve At 10 June 2015 at 08:02

I am trying to determine whether to use

PrintWriter pw = new PrintWriter(outputFilename, "ISO-8859-1");

or

PrintWriter pw = new PrintWriter(outputFilename, "US-ASCII");

I was reading All about character sets to determine the character set of an example file which I must create in the same encoding via java code.

When my example file contains "European" letters (Norwegian: å ø æ), then the following command tells me the file encoding is "iso-8859-1"

file -bi example.txt

However, when I take a copy of the same example file and modify it to contain different data, without any Norwegian text (let's say, I replace "Bjørn" with "Bjorn"), then the same command tells me the file encoding is "us-ascii".

file -bi example-no-european-letters.txt

What does this mean? Is ISO-8859-1 in practise the same as US-ASCII if there are no "European" characters in it?

Should I just use a charset "ISO-8559-1" and everything will be ok?

Original Q&A

There are 2 answers

Kaliappan On 10 June 2015 at 08:32

It depends on different types of characters we use in the respective document. ASCII is 7-bit charset and ISO-8859-1 is 8-bit charset which supports some additional characters. But, mostly, if you are going to reproduce the document from inputstream, I recommend the ISO-8859-1 charset. It will work for textfile like notepad and MS word.

If you are using some different international characters, we need to check the corresponding charset which supports that particular character like UTF-8..

**Kayaman** · Accepted Answer · 2015-06-10T08:10:14+00:00

If the file contains only the 7-bit US-ASCII characters it can be read as US-ASCII. It doesn't tell anything about what was intended as the charset. It may be just a coincidence that there were no characters that would require a different coding.

ISO-8859-1 (and -15) is a common european encoding, able to encode äöåéü and other characters, the first 127 characters being the same as in US-ASCII (as often is, for convenience reasons).

However you can't just pick an encoding and assume that "everything will be OK". The very common UTF-8 encoding also contains the US-ASCII charset, but it will encode for example äöå characters as two bytes instead of ISO-8859-1's one byte.

TL;DR: Don't assume things with encodings. Find out what was intended and use that. If you can't find it out, observe the data to try to figure out what is a correct charset to use (as you noted yourself, multiple encodings may work at least temporarily).

TechQA.

Determining ISO-8859-1 vs US-ASCII charset

There are 2 answers

Related Questions in JAVA

Related Questions in CHARACTER-ENCODING

Related Questions in ASCII

Related Questions in ISO-8859-1

Related Questions in CHARACTER-SET

Popular Questions

Trending Questions