Node.js not waiting for nested inner function calls to execute

366 views Asked by At

Admittedly I'm a novice with node, but it seems like this should be working fine. I am using multiparty to parse a form, which returns an array. I am then using a for each to step through the array. However - the for each is not waiting for the inner code to execute. I am a little confused as to why it is not, though.

var return_GROBID =  function(req, res, next) {
  var form = new multiparty.Form();

  var response_array = [];
  form.parse(req, function(err, fields, files) {

    files.PDFs.forEach(function (element, index, array) {
      fs.readFile(element.path, function (err, data) {
        var newPath = __dirname + "/../public/PDFs/" + element.originalFilename;
        fs.writeFile(newPath, data, function (err) {
          if(err) {
            res.send(err);
          }
          GROBIDrequest.GROBID2js(newPath, function(response) {
            response_array.push(response);
            if (response_array.length == array.length) {
                res.locals.body = response_array;
                next();
            }
          });
        });
      });
    });
 });
}

If someone can give me some insight on the proper way to do this that would be great.

EDIT: The mystery continues. I ran this code on another machine and IT WORKED. What is going on? Why would one machine be inconsistent with another?

1

There are 1 answers

5
RyanWilcox On

I'd guess the PDFs.forEach is you just calling the built-in forEach function, correct?

In Javascript many things are asynchronous - meaning that given:

linea();
lineb();

lineb may be executed before linea has finished whatever operation it started (because in asynchronous programming, we don't wait around until a network request comes back, for example).

This is different from other programming languages: most languages will "block" until linea is complete, even if linea could take time (like making a network request). (This is called synchronous programming).

With that preamble done, back to your original question:

So forEach is a synchronous function. If you rewrote your code like the following, it would work (but not be useful):

 PDFs.forEach(function (element, index, array) {
   console.log(element.path)
 }

(console.log is a rare synchronous method in Javascript).

But in your forEach loop you have fs.readFile. Notice that last parameter, a function? Node will call that function back when the operation is complete (a callback).

Your code will currently, and as observed, hit that fs.readFile, say, "ok, next thing", and move on to the next item in the loop.

One way to fix this, with the least changing the code, is to use the async library.

async.forEachOf(PDFs, function(value, key, everythingAllDoneCallback) {

 GROBIDrequest.GROBID2js(newPath, function(response) {
        response_array.push(response);
        if (response_array.length = array.length) {
          ...
        }

        everythingAllDoneCallback(null)

} );

With this code you are going through all your asynchronous work, then triggering the callback when it's safe to move on to the next item in the list.

Node and callbacks like this are a very common Node pattern, it should be well covered by beginner material on Node. But it is one of the most... unexpected concepts in Node development.

One resource I found on this was (one from a set of lessons) about NodeJS For Beginners: Callbacks. This, and playing around with blocking (synchronous) and non-blocking (asynchronous) functions, and hopefully this SO answer, may provide some enlightenment :)