Background
Recently my colleague add some new tests to our test project. One of them has not passed on or continuous integration system. Since we have around 800 tests and it takes an hour to run all of it, we often make a mistake and run on our dev machines only the tests which we've currently implemented. This method has its weakness because from time to time tests are passing locally but then fail on the integration system. Of course, someone could say "it is not a mistake, tests should be independent of each other!".
In ideal world.. sure, but not in my world. Not in a world in which you have a lot of singletons initialized in initialization
section, lot of global variables introduced by the Delphi itself, an OTL thread pool initialized in background, DevExpress methods hooked to your controls for painting purposes.. and dozens of other things which I am not aware of. So in final result one test can change the behavior of other test. (Which of course is bad itself and I am glad it happen, because hopefully I will be able to fix another dependency).
I've started the whole test package on my machine and I've achieved the same results as on integration system. So far so good, now I've started turning off tests until I've narrowed down the one test which interfered with the one recently added. They have nothing in common. I've digged deeper, and narrowed the problem to one single line. If I comment it - test passes, if not - test fails.
Problem
We have such code to convert text data into longitude coords(only important part was included):
procedure TTerminalNVCParserTest_Unit.TranslateGPS_ValidGPSString_ReturnsValidCoords;
const
CStrGPS = 'N5145.37936E01511.8029';
var
LLatitude, LLongitude: Integer;
LLong: Double;
LStrLong, LTmpStr: String;
LFS: TFormatSettings;
begin
FillChar(LFS, SizeOf(LFS), 0);
LFS.DecimalSeparator := '.';
LStrLong := Copy(CStrGPS, Pos('E', CStrGPS)+1, 10);
LTmpStr := Copy(LStrLong,1,3);
LLong := StrToFloatDef( LTmpStr, 0, LFS );
LTmpStr := Copy(LStrLong,4,10);
LLong := LLong + StrToFloatDef( LTmpStr, 0, LFS)*1/60;
LLongitude := Round(LLong * 100000);
CheckEquals(1519671, LLongitude);
end;
The problem is that LLongitude
is sometimes equal to 1519671 and sometimes it gives 1519672. And whether it gives 1519672 or not is dependent from other totally unrelated piece of code in different method in different test:
FormXtrMainImport.JvWizard1.SelectNextPage;
I've quadruple checked SelectNextPage method, it does not fire any event that could change how the FPU unit works. It does not change the value of RoundingMode
it is always set up on rmNearest.
Besides, shouldn't Delphi use here a banker rule? :
LLongitude := Round(LLong * 100000); //LLong * 100000 = 1519671,5
If banker rule is used it should give me always 1519672 not 1519671.
I guess that there must be some corrupted memory which is causing the problem and the line with SelectNextPage
only reveals it. However the same problem occurs on three different machines.
Anyone could give me an idea how to trace this problem? Or how to assure always a stable result of conversion?
To those who misunderstood my question
I've checked the RoundingMode and I've mentioned about it earlier: "I've quadruple checked SelectNextPage method, it does not fire any event that could change how the FPU unit works. It does not change the value of RoundingMode it is always set up on rmNearest." RoundingMode is always rmNearest before any runding occurs in the above code.
This is not the real test. This is only the code to show where problem occurs.
Video description added.
So, in striving to improve my question I've decided to add the video that shows my bizzare problem. This is the production code, I've only added assertions for checking RoundingMode. In first part of the video I am showing the original test (@Sir Rufo , @Craig Young), the method responsible for conversion and the correct result which I am getting. In the second part I am showing that when I add another unrelated test I am getting incorrect result. Video can be found here
Reproducible example added
It all boils down to below code:
procedure FloatingPointNumberHorror;
const
CStrGPS = 'N5145.37936E01511.8029';
var
LLongitude: Integer;
LFloatLon: Double;
adcConnection: TADOConnection;
qrySelect: TADOQuery;
LCSVStringList: TStringList;
begin
//Tested on Delphi 2007, 2009, XE 5 - Windows 7 64 bit
adcConnection := TADOConnection.Create(nil);
qrySelect := TADOQuery.Create(adcConnection);
LCSVStringList := TStringList.Create;
try
//Prepare on the fly csv file required by ADOQuery
LCSVStringList.Add('Col1;Col2;');
LCSVStringList.Add('aaaa;1234;');
LCSVStringList.SaveToFile(ExtractFilePath(ParamStr(0)) + 'test.csv');
qrySelect.CursorType := ctStatic;
qrySelect.Connection := adcConnection;
adcConnection.ConnectionString := 'Provider=Microsoft.Jet.OLEDB.4.0;Data Source='
+ ExtractFilePath(ParamStr(0)) + ';Extended Properties="text;HDR=yes;FMT=Delimited(;)"';
// Real stuff begins here, above we have only preparation of environment.
LFloatLon := 15 + 11.8029*1/60;
LLongitude := Round(LFloatLon * 100000);
Assert(LLongitude = 1519671, 'Asertion 1'); //Here you will NOT receive error.
//This line changes the FPU control word from $1372 to $1272.
//This causes the change of Precision Control Field (PC) from 3 which means
//64bit precision to 2 which means 53 bit precision thus resulting in improper rounding?
//--> ADODB.TParameters.InternalRefresh->RefreshFromOleDB -> CommandPrepare.Prepare(0)
qrySelect.SQL.Text := 'select * from [test.csv] WHERE 1=1';
LFloatLon := 15 + 11.8029*1/60;
LLongitude := Round(LFloatLon * 100000);
Assert(LLongitude = 1519671, 'Asertion 2'); //Here you will receive error.
finally
adcConnection.Free;
LCSVStringList.Free;
end;
end;
Just copy and paste this procedure and add ADODB
to uses clause. It seems that the problem is caused by some Microsoft COM object which is used by Delphi's ADO wrapper. This object is changing FPU control word, but it is not changing the rounding mode. It is changing precision control.
Here is the FPU screenshot before and after firing up ADO-related method.:
The only solution which comes to my mind is to use Get8087CW
before using ADO code and then Set8087CW
to setup control world with previously stored one.
The problem is most likely because something else in your code is changing the floating point rounding mode. Have a look at this program:
The output is:
Clearly your particular calculation depends on the floating point rounding mode as well as the actual input value and the code. Indeed the documentation does make this point:
So you need to first of all find whatever else in your program is modifying the floating point control word. And then you must make sure that you set it back to the desired value whenever that mis-behaving code executes.
Congratulations on debugging this further. In fact it is actually the multiplication
which is influenced by the precision control.
To see that this is so, look at this program:
Output
So, precision control influences the results of the multiplication, at least in the 80 bit registers of the 8087 unit.
The compiler doesn't store the result of that multiplication to a variable and it remains in the FPU, so this difference flows on to the
Round
.Notice how the result of the multiplication is left in
ST(0)
because that's exactly whereRound
expects its parameter.In fact, if you pull the multiplication into a separate statement, and assign it to a variable, then the behaviour becomes consistent again:
The above code produces the same output for both
$1272
and$1372
.There basic issue remains though. You have lost control of the floating point control state. To deal with this you'll need to keep control of your FP control state. Whenever you call into a library that may modify it, store it away before calling, and then restore when the call returns. If you want to have anything like repeatable, reliable and robust floating point code, this sort of game is, unfortunately, inevitable.
Here is my code to do that:
Note that this code handles both floating point units and so is ready for 64-bit which uses the SSE unit rather than the 8087 unit.
For what it is worth, here is my SSCCE:
Output