I maintain an application in Delphi 7 which have a server part that can be compiled with CrossKylix. For performance matter I'm benching multiThreading and Critical section use.
I made a console application that create 100 TThread and each TThread compute a fibonacci. Then I add a critical section so that only one thread compute a fibonacci at a time. As expected, the application is faster without the Critical section.
Then I made a console application that create 100 TThread and each TThread add words in a local TStringList and sort that TStringList. Then I add a critical section so that only one thread is executing at a time. On Windows, as expected, the application runs faster without the Critical section. On Linux the CriticalSection version runs 2 times faster than the version without Critical Section.
The CPU on Linux is an AMD Opteron with 6 cores so the app should benefit from multithreading.
Can somebody explain why the version with Critical section is faster?
Edit (add some code)
Threads creation and waiting
tmpDeb := Now;
i := NBTHREADS;
while i > 0 do
begin
tmpFiboThread := TFiboThread.Create(true);
tmpFiboThread.Init(i, ParamStr(1) = 'Crit');
Threads.AddObject(IntToStr(i), tmpFiboThread);
i := i-1;
end;
i := 0;
while i < NBTHREADS do
begin
TFiboThread(Threads.Objects[i]).Resume;
i := i+1;
end;
i := 0;
while i < NBTHREADS do
begin
TFiboThread(Threads.Objects[i]).WaitFor;
i := i+1;
end;
WriteLn('Traitement total en : ' + inttostr(MilliSecondsBetween(Now, tmpDeb)) + ' milliseconds');
The TThread and Critical section use
type TFiboThread = class(TThread)
private
n : Integer;
UseCriticalSection : Boolean;
protected
procedure Execute; override;
public
ExecTime : Integer;
procedure Init(n : integer; WithCriticalSect : Boolean);
end;
var
CriticalSection : TCriticalSection;
implementation
uses DateUtils;
function fib(n: integer): integer;
var
f0, f1, tmpf0, k: integer;
begin
f1 := n + 100000000;
IF f1 >1 then
begin
k := f1-1;
f0 := 0;
f1 := 1;
repeat
tmpf0 := f0;
f0 := f1;
f1 := f1+tmpf0;
dec(k);
until k = 0;
end
else
IF f1 < 0 then
f1 := 0;
fib := f1;
end;
function StringListSort(n: integer): integer;
var
tmpSL : TStringList;
i : Integer;
begin
tmpSL := TStringList.Create;
i := 0;
while i < n + 10000 do
begin
tmpSL.Add(inttostr(MilliSecondOf(now)));
i := i+1;
end;
tmpSL.Sort;
Result := StrToInt(tmpSL.Strings[0]);
tmpSL.Free;
end;
{ TFiboThread }
procedure TFiboThread.Execute;
var
tmpStr : String;
tmpDeb : TDateTime;
begin
inherited;
if Self.UseCriticalSection then
CriticalSection.Enter;
tmpDeb := Now;
tmpStr := inttostr(fib(Self.n));
//tmpStr := inttostr(StringListSort(Self.n));
Self.ExecTime := MilliSecondsBetween(Now, tmpDeb);
if Self.UseCriticalSection then
CriticalSection.Leave;
Self.Terminate;
end;
procedure TFiboThread.Init(n : integer; WithCriticalSect : Boolean);
begin
Self.n := n;
Self.UseCriticalSection := WithCriticalSect;
end;
initialization
CriticalSection := TCriticalSection.Create;
finalization
FreeAndNil(CriticalSection);
Edit 2
I read this why-using-more-threads-makes-it-slower-than-using-less-threads so as I understand this, the context switching cost a lot more CPU resource with Linux and Kylix compilation than context switching with win32.
Sorting stringlist have a lot of memory allocations i.e. calls to memory manager. Memory manager itself is thread safe, means that it use some kind of critical section inside. So, if hundred threads runing simulatinusly without global critical section, they will do thouthand of calls to MM which means thouthand of internal locks (instead of one lock of global critical section)
Thats why pure fibonacci function (without stringlist building and sorting) works as expected - it does not have internal, hided locks