Unicode conversion

156 views Asked by At

Config:

  • OS: Windows 7 (32 bits)
  • DMD 2.58 using Phobos standard library

My Intent:

I began to port a old package (10 modules) written back in 2007. It featured a full unicode support and I want to keep that capability.

Its author has written a specific module (class UnicodeBom(T)) for the purpose: A very involved approach (to my capacity at least, given that I am just an enthustiast and a beginner with only some C++/Qt/C# primer experience)!

I cleaned up any Tango code fragment from all the modules and succeeded to get to work on my box (at unit/module level for sure) only 4 modules out of 10 so far.


Code fragment:

this(Stream st) {
  void[] buf;
  buf.length = cast(uint) st.size;
  st.readBlock(buf.ptr, cast(uint) st.size);

  auto unicode = new UnicodeBom!(wchar)(Encoding.Unknown); // <<< to refactor
  mSourceBuffer = unicode.decode(buf); // <<< to refactor
}

where

  • st (parameter) is a std.stream.Stream
  • mSourceBuffer (private field) is a wchar[]

Quote:

Excerpt from code documentation related to final T[] decode (void[] content) method:

Convert the provided content. The content is inspected for a BOM signature, which is stripped. An exception is thrown if a signature is present when, according to the encoding type, it should not be. Conversely, An exception is thrown if there is no known signature where the current encoding expects one to be present


My Question:

Is there an up to date and more idiomatic approach using out of the box Druntime and/or Phobos resources leading to the same outcome (namely loading UTF8/16/32 files and converting them to wchar[] without BOM)?

Thanks in advance.

1

There are 1 answers

0
menjaraz On BEST ANSWER

I ended up succeeding to port all the modules to DMD 2.59 by fixing one by one all issues raised.