WriteAllText, Character Encoding, £ and?

2k views Asked by At

Take the following example:

string testfile1 = Path.Combine(HttpRuntime.AppDomainAppPath, "folder\\" + "test1.txt");
if (!System.IO.File.Exists(testfile1))
{
    System.IO.File.WriteAllText(testfile1, "£100", System.Text.Encoding.ASCII);
}

string testfile2 = Path.Combine(HttpRuntime.AppDomainAppPath, "folder\\" + "test2.txt");
if (!System.IO.File.Exists(testfile2))
{
    System.IO.File.WriteAllText(testfile2, "£100", System.Text.Encoding.UTF8);
}

Note the encoding. The first outputs ?100. The second outputs £100.

I know the encoding is different, but can somebody explain why ASCII encoding can't write the £ sign?

3

There are 3 answers

0
ispiro On

ASCII doesn't include the "£" character. That is - there is no byte value (nor a multiple byte value - they don't exist in ASCII) that denotes that symbol. So it shows you a "?" to tell you that. UTF8, on the other hand, does include it.

See here a list of all of the printable characters in ASCII.

If you must use ASCII, consider using "GBP" as mentioned here for Pound sterling. (Also might be relevant: Extended ASCII.)

0
Fabulous On

To deal with ASCII and certain characters it depended largely on what code page you're using. £ isn't a character that is required or used universally within the latin alphabet so didn't appear in the standard ASCII set.

Look at this article or this one on code pages to see how the character limitation was resolved and for an idea as to why it won't show up everywhere.

0
Dietrich Baumgarten On

As Hans pointed out, ASCII is designed to Americans using only code points 0-127, the negligible rest of the English speaking world can live with that unless they try to use obscure symbols like £ with code points outside the range 0-127. I presume you live in the UK and aim only at customers from the UK, or Western Europe. Don't use Encoding.ASCII but Encoding.Default which would be code page 1252 in the UK, not in Turkey of course. You get real ASCII for every character in the ASCII range 0-127 but can also use characters in the range 128-255 where the pound symbol lives. But note, if someone tries to read the file assuming it is encoded in UTF8, the £ sign will obscure the content since it includes a byte that is non-existing in UTF8. This is indicated by some weird glyph like �.