Why does my 8-puzzle solution runs faster when I create an array twice

232 views Asked by At

I wrote an algorithm to solve N-puzzle problem using breadth-first search. In trying to make things faster I decided to allocate a large array up front instead of repeatedly pushing and shifting values to an empty array.

By chance I noticed an odd behavior where allocating the large array twice actually made the wall clock time faster. I created a gist with the complete code but the part that's giving me odd behavior is here:

var values = new Array(1000000);

function breadthFirstSearch(state, goalState) {
  values = new Array(1000000);
  // implementation of a breadth first search for the n-puzzle problem
}

The typical runtime when values is initialized twice using node.js is about 550ms on my machine. When I comment out one of the places where I am instantiating values so I am only instantiating it once, the runtime increases to about 650ms - 700ms. It seems to me like the runtime should decrease when only allocating the array one time. I created a video to explain what I am doing and that shows the run times on my machine here.

Oddly, when I ran it on repl.it the runtime is about the same whether it's commented out or not, which makes me think it has something to do with the v8 engine.

Can anyone give me an explanation why the wall clock time increases when doing what should be less work?

1

There are 1 answers

1
Marko Gresak On

I have written most of what will be in this answer in the comments, because at the time of writing those comments, this question was locked.

I have extended original code with better time measuring calls, which are using performance.now, which is available in modern browser or as node module performance-now. The code is uploaded as this gist.

What I've changed:

  1. Added function init(), which (re)initializes all global variables, otherwise the program won't work across multiple runs.
  2. Duplicated function breadthFirstSearch into breadthFirstSearchUncommented and breadthFirstSearchCommented, first one initializes values = new Array(1000000); on each call and the second one uses global variable values as it is, i.e. the statement mentioned before is commented / removed.

  3. Modified function time to accept which BFS function should be used.

  4. Added function measureTime, which calls time with given BFS function argument and measures min, max and average of runs (argument) measures. At the end, it outputs total time (sum of times for all time calls). Code for this function is below and of course on the gist, at the end of index.js file.

  5. Removed add logging messages, only log messages left are the beginning and end of each measureTime call.

Function measureTime:

function measureTime(bfsFn, label, runs) {
  console.log('starting', runs, 'measures for', label);
  var diff = 0;
  var min = Number.MAX_VALUE, max = Number.MIN_VALUE;
  for (var i = 0; i < runs; i++) {
    init();
    var start = now();
    time(bfsFn);
    var d = now() - start;
    diff += d;
    min = Math.min(d, min);
    max = Math.max(d, max);
  }
  var avg = diff / runs;
  console.log('%s - total time for %d measures: %sms', label, runs, diff.toFixed(3));
  console.log('%s - avg time: %sms (%sms - %sms, Δt = %sms)',
    label, avg.toFixed(3), min.toFixed(3), max.toFixed(3), (max - min).toFixed(3));
}

The actual answer: The question is why was program which was defining new array twice, i.e. once at the beginning and once at each BFS function call, faster than when the initialization code was commented.

The answer is that there is no explanation for this. It's not a result of how JavaScript VMs work, it's not any JavaScript quirks, it's much simpler: The statement in question is actually false.

What OP was saying, and was later even kind enough to provide a video explanation, was true only for single runs. The most important reason is, as we all know, modern operating systems are doing a lot of work simultaneously -- I know this is not exactly true, but bear with me for the sake of this explanation. This means that it's very difficult to have exact same runtime in multiple runs, therefore we see some fluctuation in execution times among multiple runs.

To get more accurate measurements, I called function measureTime 100 times for each commented and another 100 times for uncommented function. This created a sample which minimizes the fluctuations.

I created table of comparison for different execution environments. All tests were ran on OS X Yosemite, using most recent stable versions of browsers and io.js as of 6/18/2015.

|   Environment   |  Uncommented time [ms]  |  Commented time [ms]  |
| :-------------- | :---------------------- | :-------------------- |
| Chrome          |         460.254         |        424.725        |
| Firefox         |         796.471         |        781.662        |
| Safari          |         529.492         |        474.580        |
| Node.js (io.js) |         411.804         |        394.807        |

This is the best table I could do since Stack Overflow doesn't support table markdown. For an actual table style, take a look at the gist.

In this table, it's clear that original statement saying code with uncommented array reinitialization is faster, as we can see that this is false for all four environments I tested.

The lesson to be learned here is do not make performance conclusions based on just few runs. Data of multiple test runs should be collected and be interpreted average of all values or in some cases, more sophisticated statistics methods should be used to get more accurate results.