I have a rather complicated issue with my small package. Basically, I'm building a GARCH(1,1) model with rugarch
package that is designed exactly for this purpose. It uses a chain of solvers (provided by Rsolnp
and nloptr
, general-purpose nonlinear optimization) and works fine. I'm testing my method with testthat
by providing a benchmark solution, which was obtained previously by manually running the code under Windows (which is the main platform for the package to be used in).
Now, I initially had some issues when the solution was not consistent across several consecutive runs. The difference was within the tolerance I specified for the solver (default solver = 'hybrid'
, as recommended by the documentation), so my guess was it uses some sort of randomization. So I took away both random seed and parallelization ("legitimate" reasons) and the issue was solved, I'm getting identical results every time under Windows, so I run R CMD CHECK and testthat
succeeds.
After that I decided to automate a little bit and now the build process is controlled by travis. To my surprise, the result under Linux is different from my benchmark, the log states that
read_sequence(file_out) not equal to read_sequence(file_benchmark)
Mean relative difference: 0.00000014688
Rebuilding several times yields the same result, and the difference is always the same, which means that under Linux the solution is also consistent. As a temporary fix, I'm setting a tolerance limit depending on the platform, and the test passes (see latest builds).
So, to sum up:
- A numeric procedure produces identical output on both Windows and Linux platforms separately;
- However, these outputs are different and are not caused by random seeds and/or parallelization;
I generally only care about supporting under Windows and do not plan to make a public release, so this is not a big deal for my package per se. But I'm bringing this to attention as there may be an issue with one of the solvers that are being used quite widely.
And no, I'm not asking to fix my code: platform dependent tolerance is quite ugly, but it does the job so far. The questions are:
- Is there anything else that can "legitimately" (or "naturally") lead to the described difference?
- Are low-level numeric routines required to produce identical results on all platforms? Can it happen I'm expecting too much?
- Should I care a lot about this? Is this a common situation?