What's the correct / best way to perform costly per-process initialisation when using multiple workers, ensuring the workers can correctly communicate with the master?
As part of a custom locustfile.py I need to download a large dataset from a URL and load into memory to generate GET request parameters (download time ~60s; load time takes ~5s). For a single-process locust setup I have used the init event to perform this, and that works fine. However when running with multiple workers (--processes=N) I've run into a few issues with simply using the init event:
Attempt 1: As init is fired once per process, using that event for all workers results in downloading the same files from all processes, which is inefficient and causes problems with files overwriting each other.
Attempt 2: Based on the test_data_management.py example, I tried having master / local runner downloading via init, but deferring the loading (only from disk, re-using the already-downloaded files) until on_start:
@events.init.add_listener
def on_locust_init(environment, **_kwargs):
if not isinstance(environment.runner, WorkerRunner):
# Download (if not exists in local files), read and load
# into environment.dataset
setup_dataset(environment)
...
@events.test_start.add_listener
def setup_worker_dataset(environment, **_kwargs):
if isinstance(environment.runner, WorkerRunner):
# Make the Dataset available for WorkerRunners (non-Worker will have
# already downloaded the dataset via on_locust_init).
setup_dataset(environment, skip_download_and_populate=True)
This seems to work ok if setup_worker_dataset() is quick (less than 1s), however for larger datasets it can take longer, and I see problems with the master / worker communication, and load generation terminates:
locust-swarm-0-f008c40/INFO/locust.runners: Worker locust-swarm-0-xxx failed to send heartbeat, setting state to missing.
locust-swarm-0-f008c40/INFO/locust.runners: Worker locust-swarm-0-yyy failed to send heartbeat, setting state to missing.
locust-swarm-0-f008c40/INFO/locust.runners: The last worker went missing, stopping test.
(Of note setup_dataload() performs blocking / CPU-heavy work, so I expect it's preventing the gevent event loop from running)
Essentially I need a "safe place" to perform some CPU-heavy work at the process/environment level before the User objects start running tasks.