I have a script that will download thousands of files from a server, perform some CPU-intensive calculations on those files, and then upload the results somewhere. As an added level of complexity, I want to limit the number of concurrent connections to the server where I'm downloading the files.
To get the CPU-intensive calculations off the event thread, I leveraged workerpool by josdejong. I also figured I could take advantage of the fact that only a limited number of threads will be spun up at any given time to limit the number of concurrent connections to my server, so I tried putting the network I/O in the worker process like so (TypeScript):
import Axios from "axios";
import workerpool from "workerpool";
const pool = workerpool.pool({
minWorkers: "max",
});
async function processData(file: string) {
console.log("Downloading " + file);
const csv = await Axios.request<IncomingMessage>({
method: "GET",
url: file,
responseType: "stream"
});
console.log(csv);
// TODO: Will process the file here
}
export default async function (files: string[]) {
const promiseArray: workerpool.Promise<Promise<void>>[] = [];
// Only processing the first file for now during testing
files.slice(0, 1).forEach((file) => {
promiseArray.push(pool.exec(processData, [file]));
});
await Promise.allSettled(promiseArray);
await pool.terminate();
}
When I compile and run this code I see the message "Downloading test.txt", but after that I don't see the following log statement (console.log(csv)
)
I've tried various modifications on this code including removing the responseType
, removing await
and just inspecting the Promise
that's returned by Axios, making the function non-async, etc. No matter what it seems to always crash on the Axios.request
line
Are worker threads not able to open HTTP connections or something? Or am I just making a silly mistake?
If it is not getting to this line of code:
Then, either the
Axios.request()
is never fulfilling its promise or that promise is rejecting. You have no error handling at all in any of these functions so if it was rejecting, you wouldn't know and wouldn't be logging the problem. As a starter, I would suggest you instrument your code so you can log any rejections:As a general point of code design, you should be catching and logging any possible promise rejection at some level. You don't have to catch them all at the lowest calling level as they will propagate up through returned promises, but you do need to catch any possible rejection somewhere and, for your own development sanity, you will want to log it so you can see when it happens and what the error is.