With the following code, I want to serialize a Data.Text value to a ByteString. Unfortunately my text is prepended with unnecessary NUL bytes and an EOT byte:
GHCi, version 9.4.4: https://www.haskell.org/ghc/ :? for help
ghci> import qualified Data.Text as T
ghci> import Data.Binary
ghci> import Data.Binary.Put
ghci> let txt = T.pack "Text"
ghci> runPut $ put txt
"\NUL\NUL\NUL\NUL\NUL\NUL\NUL\EOTText"
ghci>
Questions:
- Why are these NUL and EOT bytes generated?
- How can I avoid them in the resulting ByteString?
PS: I the real code I put the length in front of the text
foo :: Text -> ByteString
foo txt = runPut do
putWord32host $ T.length txt
put txt
It actually already encodes the length in the binary string. Indeed, if we look at the source code, for the
Textinstance ofBinary, we seeĀ [src]:That's not much of a surprise, we encode it to UTF-8 which produces a
ByteString, and then useputon that one. But the length is added when weputtheByteStringitself. Indeed, theBinaryStringinstance ofBinarylooks likeĀ [src]:The
putfor theByteStringproduced byencodeUtf8thus writes eight bytes to specify the size of theByteString, this is thus the number of bytes, not (per se the same as) the number of characters in theText.If you would want the same effect, but without the length prefix, you can use:
this thus omits the length header.