I wrote an algorithm to solve N-puzzle problem using breadth-first search. In trying to make things faster I decided to allocate a large array up front instead of repeatedly pushing and shifting values to an empty array.
By chance I noticed an odd behavior where allocating the large array twice actually made the wall clock time faster. I created a gist with the complete code but the part that's giving me odd behavior is here:
var values = new Array(1000000);
function breadthFirstSearch(state, goalState) {
values = new Array(1000000);
// implementation of a breadth first search for the n-puzzle problem
}
The typical runtime when values is initialized twice using node.js is about 550ms on my machine. When I comment out one of the places where I am instantiating values so I am only instantiating it once, the runtime increases to about 650ms - 700ms. It seems to me like the runtime should decrease when only allocating the array one time. I created a video to explain what I am doing and that shows the run times on my machine here.
Oddly, when I ran it on repl.it the runtime is about the same whether it's commented out or not, which makes me think it has something to do with the v8 engine.
Can anyone give me an explanation why the wall clock time increases when doing what should be less work?
I have written most of what will be in this answer in the comments, because at the time of writing those comments, this question was locked.
I have extended original code with better time measuring calls, which are using
performance.now
, which is available in modern browser or as node moduleperformance-now
. The code is uploaded as this gist.What I've changed:
function init()
, which (re)initializes all global variables, otherwise the program won't work across multiple runs.Duplicated
function breadthFirstSearch
intobreadthFirstSearchUncommented
andbreadthFirstSearchCommented
, first one initializesvalues = new Array(1000000);
on each call and the second one uses global variablevalues
as it is, i.e. the statement mentioned before is commented / removed.Modified
function time
to accept which BFS function should be used.Added
function measureTime
, which callstime
with given BFS function argument and measures min, max and average ofruns
(argument) measures. At the end, it outputs total time (sum of times for alltime
calls). Code for this function is below and of course on the gist, at the end ofindex.js
file.Removed add logging messages, only log messages left are the beginning and end of each
measureTime
call.Function
measureTime
:The actual answer: The question is why was program which was defining new array twice, i.e. once at the beginning and once at each BFS function call, faster than when the initialization code was commented.
The answer is that there is no explanation for this. It's not a result of how JavaScript VMs work, it's not any JavaScript quirks, it's much simpler: The statement in question is actually false.
What OP was saying, and was later even kind enough to provide a video explanation, was true only for single runs. The most important reason is, as we all know, modern operating systems are doing a lot of work simultaneously -- I know this is not exactly true, but bear with me for the sake of this explanation. This means that it's very difficult to have exact same runtime in multiple runs, therefore we see some fluctuation in execution times among multiple runs.
To get more accurate measurements, I called function
measureTime
100 times for each commented and another 100 times for uncommented function. This created a sample which minimizes the fluctuations.I created table of comparison for different execution environments. All tests were ran on OS X Yosemite, using most recent stable versions of browsers and io.js as of 6/18/2015.
This is the best table I could do since Stack Overflow doesn't support table markdown. For an actual table style, take a look at the gist.
In this table, it's clear that original statement saying code with uncommented array reinitialization is faster, as we can see that this is false for all four environments I tested.
The lesson to be learned here is do not make performance conclusions based on just few runs. Data of multiple test runs should be collected and be interpreted average of all values or in some cases, more sophisticated statistics methods should be used to get more accurate results.