Convert Extended ASCII into UTF8

891 views Asked by Carlos Siestrup At 15 October 2020 at 13:58

I was asked to solve an encoding problem in a file. It was expected to be in UTF8 but it was actually in extended ASCII.

The result is a file with cases like this:

BrasÃlia; EletrÃ´nicos e InformÃ¡tica CÃ¢meras e AcessÃ³rios mÃºsica

When it should actually be :

Brasília Eletrônicos e Informática Câmeras e Acessórios música

I solved it with this code :

private static string FixEncodingIssues(string str)
        {
            string fixedStr = str;

            foreach (KeyValuePair<string, string> pair in encodingErrosDic)
                fixedStr = fixedStr.Replace(pair.Key,pair.Value);
            
            return fixedStr;
        }

        private static Dictionary<string, string> encodingErrosDic = new Dictionary<string, string>()
        {
            { "Ãƒ" , "Ã" },
            { "Ã\x81"  , "Á" },
            { "Ã€" , "À" },
            { "Ã‚" , "Â" },
            { "Ã„" , "Ä" },
            { "Ã…" , "Å" },
            { "Ã‡" , "Ç" },
            { "Ãˆ" , "È" },
            { "Ã‰" , "É" },
            { "ÃŠ" , "Ê" },
            { "Ã‹" , "Ë" },
            { "ÃŒ" , "Ì" },
            { "Ã\x8D"  , "Í" },
            { "ÃŽ" , "Î" },
            { "Ã\x8F"  , "Ï" },
            { "Ã\x90"  , "Ð" },
            { "Ã‘" , "Ñ" },
            { "Ã’" , "Ò"},
            { "Ã“" , "Ó" },
            { "Ã”" , "Ô" },
            { "Ã•" , "Õ" },
            { "Ã–" , "Ö" },
            { "Ã—" , "×" },
            { "Ã˜" , "Ø" },
            { "Ã™" , "Ù" },
            { "Ãš" , "Ú" },
            { "Ã›" , "Û" },
            { "Ãœ" , "Ü" },
            { "Ã\x9D" , "Ý" },
            { "Ã\xA0" , "à" },
            { "Ã¡" , "á" },
            { "Ã¢" , "â" },
            { "Ã£" , "ã" },
            { "Ã¤" , "ä" },
            { "Ã¥" , "å" },
            { "Ã¦" , "æ" },
            { "Ã§" , "ç" },
            { "Ã¨" , "è" },
            { "Ã©" , "é" },
            { "Ãª" , "ê"},
            { "Ã«" , "ë" },
            { "Ã¬" , "ì" },
            { "Ã®" , "î" },
            { "Ã¯" , "ï" },
            { "Ã\xAD" , "í" },
            { "Ã°" , "ð" },
            { "Ã±" , "ñ" },
            { "Ã²" , "ò" },
            { "Ã³" , "ó" },
            { "Ã´" , "ô" },
            { "Ãµ" , "õ" },
            { "Ã¶" , "ö" },
            { "Ã¸" , "ø" },
            { "Ã¹" , "ù" },
            { "Ãº" , "ú" },
            { "Ã»" , "û" },
            { "Ã¼" , "ü" },
            { "Ã½" , "ý" }
        };

I would like to know if there is a nicer way to solve this issue. I feel that my solution is too rough, it won't work for bytes not listed in the dictionary. I wished to know if there is a cleaner solution that doesn't involve listing all the extended cases and replacing them with equivalent UTF8 values.

Original Q&A

TechQA.

Convert Extended ASCII into UTF8

There are 0 answers

Related Questions in C#

Related Questions in ENCODING

Related Questions in EXTENDED-ASCII

Popular Questions

Popular Tags

Trending Questions