string := const : why different implementation for local and result?

4.1k views Asked by At

In Delphi function result is frequently implemented as var-parameter (not out-parameter despite QC ticket).

String constants are basically variables with negative refcounter, which should suppress automatic memory [de]allocation. http://docwiki.embarcadero.com/RADStudio/XE3/en/Internal_Data_Formats#Long_String_Types

It really does suppress it: the code below does not leak.

type
  TDealRecord = record
    id_Type: Integer;
    Price: extended;
    Remark: String;
  end;
const const_loop = 100000000;

function TestVar: TDealRecord;
//procedure TestVar;
var
  Li: Integer;
  LRec: TDealRecord;
begin
  for Li := 1 to const_loop do begin
     FillChar(Lrec,SizeOf(LRec), 0);
     LRec.Remark := 'Test';

//     FillChar(Result,SizeOf(Result), 0);
//     Result.Remark := 'Test';
  end;
end;

But change the manipulated variable - and it immediately starts to leak heavily.

function TestVar: TDealRecord;
//procedure TestVar;
var
  Li: Integer;
  LRec: TDealRecord;
begin
  for Li := 1 to const_loop do begin
//     FillChar(Lrec,SizeOf(LRec), 0);
//     LRec.Remark := 'Test';

     FillChar(Result,SizeOf(Result), 0);
     Result.Remark := 'Test';
  end;
end;

It turns out that string := const is implemented with different calls, depending on LValue:

  1. Result: AnsiString -> LStrAsg
  2. Result: UnicodeString: -> UStrAsg
  3. Local var: UnicodeString: -> UStrLAsg
  4. Local var: AnsiString: -> LStrLAsg

And while the latter two are cloning pointer as expected, the former two are copying the string to new instance, like if i add UniqueString call to them.

Why that difference ?

2

There are 2 answers

10
Arioch 'The On BEST ANSWER

After discussion with David Heffernan, i am starting to think that Delphi compiler just does not know what is the value it assigns to variable. Kind of "type erasure" having place. It cannot tell global constant from local on-stack variable and local string expression. It cannot tell if the source would exist after function exit happened. while we know that is string literal or global constant or anything with lifetime independent of the function execution - the compiler just loses that info. And instead it plays defensive and always cloning the value - just for the chance that it would cease to exist. I am not sure, but that looks reasonable. Though the consequences of this rough indiscriminate codegen rule are one more gotcha in Delphi :-(

18
Arnaud Bouchez On

In Delphi, constant strings are always copied when assigned to another global variable, but not to a local variable, to avoid access violation in some borderline cases.

Use the source, Luke!

See this code extraction from System.pas:

{ 99.03.11
  This function is used when assigning to global variables.

  Literals are copied to prevent a situation where a dynamically
  allocated DLL or package assigns a literal to a variable and then
  is unloaded -- thereby causing the string memory (in the code
  segment of the DLL) to be removed -- and therefore leaving the
  global variable pointing to invalid memory.
}
procedure _LStrAsg(var dest; const source);
var
  S, D: Pointer;
  P: PStrRec;
  Temp: Longint;
begin
  S := Pointer(source);
  if S <> nil then
  begin
    P := PStrRec(Integer(S) - sizeof(StrRec));
    if P.refCnt < 0 then   // make copy of string literal
    begin
      Temp := P.length;
      S := _NewAnsiString(Temp);
      Move(Pointer(source)^, S^, Temp);
      P := PStrRec(Integer(S) - sizeof(StrRec));
    end;
    InterlockedIncrement(P.refCnt);
  end;
....

So in short, by design, and to avoid access violations when a DLL or package is unloaded and did contain some constant values sent back to the main process, a local copy is always made.

You have two functions:

  • LStrAsg or UStrAsg which is generated by the compiler when a string has a chance to be a constant - this is the code above;
  • LStrLAsg or UStrLAsg (added L stands for "local") which is generated by the compiler when the source string is local, so has no be a constant: in this case, P.refCnt < 0 won't be checked, so it will be faster than upper code.