Issues performing network I/O in a NodeJS worker thread

1k views Asked by At

I have a script that will download thousands of files from a server, perform some CPU-intensive calculations on those files, and then upload the results somewhere. As an added level of complexity, I want to limit the number of concurrent connections to the server where I'm downloading the files.

To get the CPU-intensive calculations off the event thread, I leveraged workerpool by josdejong. I also figured I could take advantage of the fact that only a limited number of threads will be spun up at any given time to limit the number of concurrent connections to my server, so I tried putting the network I/O in the worker process like so (TypeScript):

import Axios from "axios";
import workerpool from "workerpool";

const pool = workerpool.pool({
    minWorkers: "max",
});

async function processData(file: string) {
    console.log("Downloading " + file);
    const csv = await Axios.request<IncomingMessage>({
        method: "GET",
        url: file,
        responseType: "stream"
    });
    console.log(csv);
    // TODO: Will process the file here
}

export default async function (files: string[]) {
    const promiseArray: workerpool.Promise<Promise<void>>[] = [];
    // Only processing the first file for now during testing
    files.slice(0, 1).forEach((file) => {
        promiseArray.push(pool.exec(processData, [file]));
    });
    await Promise.allSettled(promiseArray);
    await pool.terminate();
}

When I compile and run this code I see the message "Downloading test.txt", but after that I don't see the following log statement (console.log(csv))

I've tried various modifications on this code including removing the responseType, removing await and just inspecting the Promise that's returned by Axios, making the function non-async, etc. No matter what it seems to always crash on the Axios.request line

Are worker threads not able to open HTTP connections or something? Or am I just making a silly mistake?

2

There are 2 answers

1
jfriend00 On BEST ANSWER

If it is not getting to this line of code:

console.log(csv);

Then, either the Axios.request() is never fulfilling its promise or that promise is rejecting. You have no error handling at all in any of these functions so if it was rejecting, you wouldn't know and wouldn't be logging the problem. As a starter, I would suggest you instrument your code so you can log any rejections:

async function processData(file: string) {
    try {
        console.log("Downloading " + file);
        const csv = await Axios.request<IncomingMessage>({
            method: "GET",
            url: file,
            responseType: "stream"
        });
        console.log(csv);
    } catch(e) {
        console.log(e);          // log an error
        throw e;                 // propagate rejection/error
    }

}

As a general point of code design, you should be catching and logging any possible promise rejection at some level. You don't have to catch them all at the lowest calling level as they will propagate up through returned promises, but you do need to catch any possible rejection somewhere and, for your own development sanity, you will want to log it so you can see when it happens and what the error is.

3
Yevhen On

You can't execute TypeScript in a worker thread. The pool.exec method accepts either a static JavaScript function or a path to a JavaScript file with the same function.

Here is a quote from the workerpool readme:

Note that both function and arguments must be static and stringifiable, as they need to be sent to the worker in a serialized form. In case of large functions or function arguments, the overhead of sending the data to the worker can be significant.

I'm trying to make this work with TypeScript. Possible ways to resolve this are:

  • write a worker function in TypeScript, compile it to a separate bundle with any bundler, and then pass the path to the compiled file to the pool.exec. I managed to make this work but the only thing that I'm not satisfied with is that with this solution you can't use nodemon (if you use it)
  • use a JS wrapper that compiles the TS source code and executes it using ts-node. Then pass the path to that wrapper to the pool.exec function. This solution won't work with bundlers