Appending UnicodeString to WideString in Delphi

1.9k views Asked by At

I'm curious about what happens with this piece of code in Delphi 2010:

function foo: WideString;
var 
   myUnicodeString: UnicodeString; 
begin
  for i:=1 to 1000 do
  begin
    myUnicodeString := ... something ...;

    result := result + myUnicodeString;  // This is where I'm interested
  end;
end;

How many string conversions are involved, and are any particularly bad performance-wise?

I know the function should just return a UnicodeString instead, but I've seen this anti-pattern in the VCL streaming code, and want to understand the process.

2

There are 2 answers

1
Remy Lebeau On BEST ANSWER

To answer your question about what the code is actually doing, this statement:

result := result + myUnicodeString;

Does the following:

  1. calls System._UStrFromWStr() to convert Result to a temp UnicodeString

  2. calls System._UStrCat() to concatenate myUnicodeString onto the temp

  3. calls System._WStrFromUStr() to convert the temp to a WideString and assign it back to Result.

There is a System._WStrCat() function for concatenating a WideString onto a WideString (and System._UStrCat() for UnicodeString). If CodeGear/Embarcadero had been smarter about it, they could have implemented a System._WStrCat() overload that takes a UnicodeString as input and a WideString as output (and vice versa for concatenating a WideString onto a UnicodeString). That way, no temp UnicodeString conversions would be needed anymore. Both WideString and UnicodeString are encoded as UTF-16 (well mostly, but I won't get into that here), so concatenating them together is just a matter of a single allocation and move, just like when concatenating two UnicodeStrings or two WideStrings together.

11
David Heffernan On

The performance is poor. There's no need for any encoding conversions since everything is UTF-16 encoded. However, WideString is a wrapper around the COM BSTR type which performs worse than native UnicodeString.

Naturally you should prefer to do all your work with the native types, either UnicodeString or TStringBuilder, and convert to WideString at the last possible moment.

That is generally a good policy. You don't want to use WideString internally since it's purely an interop type. So only convert to (and from) WideString at the interop boundary.