Encoding fails when marshalling string via COM Interop in C# (double UTF8 encoding?)

665 views Asked by At

I'm writing a plugin for Autodesk Navisworks, trying to pass a C# unicode string to a property on a COM object. However, the string is encoded incorrectly somewhere in the process.

var property = ...;
property.Name = "中文";   // becomes "??"
property.Value = "中文"; // OK

"中文" comes out as "??" in the user interface, whereas strings limited to ASCII work just fine (e.g. "abcd"). Furthermore, setting the Value-property (a VARIANT) on the same object works just fine, but not the Name.

Further exploration leads me to try encoding the string "ä" as utf-8:

C3 A4

and somehow "encoding" this into a (unicode) string:

property.Name = "\u00c3\u00a4"; // shows up as "ä"

Surprisingly this seemed to work.

This led me to try the following:

var bytes = Encoding.UTF8.GetBytes("中文abcd");
char[] chars = new char[bytes.Length];
for(int i = 0; i < chars.Length; i++)
    chars[i] = (char)bytes[i];
string s = new string(chars);

However, when I use this trying to encode "中文abcd" I only get the first character "中" in the GUI. Yet, with "äabcd" I get more than one character again...

What is happening here? How can I get around the problem? Is it a marshalling problem (e.g. incorrectly specified encoding in the COM Interop)? Or perhaps some weird code inside the application? If it's a marshalling problem, can I modify it for this property only?

1

There are 1 answers

0
johv On BEST ANSWER

Turns out that Name was an "internal" string, and I should have used the property UserName for text displayed in the GUI.

I.e. I changed:

var property = ...;
property.Name = "中文";   // becomes "??"
property.Value = "中文"; // OK

to this:

var property = ...;
property.UserName = "中文";   // OK!
property.Value = "中文"; // OK

which worked. Presumably UserName is implicitly set from Name internally in some way ignoring or mishandling the encoding.