Making Data.Text.ICU.Convert.toUnicode report decoding failures

Question

Making Data.Text.ICU.Convert.toUnicode report decoding failures

85 views Asked by Steven Taschuk At 13 November 2014 at 23:08

{-# LANGUAGE OverloadedStrings #-}
import Data.Text.IO
import Data.Text.ICU.Convert
import Prelude hiding (putStrLn)
main = do
    conv <- open "utf8" Nothing
    putStrLn $ toUnicode conv "h\xffzzah"

This program attempts to decode an invalid UTF-8 string; it prints "h�zzah", the converter having replaced the invalid byte with U+FFFD REPLACEMENT CHARACTER. I would rather it threw an exception (say, Data.Text.ICU.Error.ICUError). Is there a way to make it do so, or to otherwise report that the decoding didn't actually succeed?

Alternatively, is there a different way of doing character decoding in Haskell which reports errors of this type?

Original Q&A

There are 1 answers

**jsalvata** · Accepted Answer · 2014-11-14T17:33:17+00:00

Beyond my comment above, here's a solution: count the number of occurences of U+FFFD in the input UTF-8 byte stream (this is a safe operation because UTF-8 is substring-safe -- see http://research.swtch.com/utf8), then count the occurences in the converted string. If they differ, you had an encoding error during YOUR conversion.

TechQA.

Making Data.Text.ICU.Convert.toUnicode report decoding failures

There are 1 answers

Related Questions in HASKELL

Related Questions in CHARACTER-ENCODING

Related Questions in ICU

Popular Questions

Popular Tags

Trending Questions