How not to use a variable as global when multiprocessing in Python

86 views Asked by At

My task is to build the right application architecture and use multiprocessing in Python. I will note right away that I use the multiprocess library, which uses dill serialization.

I have 2 files: initialization_manager.py (which I want to run as the main one) and worker_manager.py (with the WorkerManager class).

In initialization_manager.py I initialize and configure the Logger and would like to use it throughout the project. Therefore, here I create an instance of WorkerManager and pass the logger to its constructor.

In WorkerManager, I want to use multiprocessing, and I would like the logger to be global (but this is becoming a bad architecture, it seems to me). What should I do in such a situation to use one logger?

The code below does not work because multiprocessing creates a copy of the logger (since I use 5 processes, it will create 5 different loggers). I have tried to narrow down the problem I am facing as much as possible.

File worker_manager.py (with the WorkerManager class, where I want to use the logger already created at the beginning of the project)

from multiprocess import Pool


class WorkerManager:
    def __init__(self, logger):
        self.logger = logger

    def start_workers(self):
        numbers = [1, 2, 3, 4, 5]
        with Pool(5) as pool:
            pool.map(self.perform_task, numbers)

    def perform_task(self, number):
        self.logger.debug(f"Number = {number}")

File initialization_manager.py (the file that I run and in which I declare the logger that I want to use in all other classes):

import logging
import sys

from worker_manager import WorkerManager

logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
handler = logging.StreamHandler(sys.stdout)
formatter = logging.Formatter('%(asctime)s . %(name)s . %(levelname)s . %(message)s')
handler.setFormatter(formatter)
logger.addHandler(handler)


def start_project():
    worker_manager = WorkerManager(logger)
    worker_manager.start_workers()


if __name__ == '__main__':
    start_project()

After launching the initialization_manager.py there are no messages in the console. But I would like to get the following:

[date] . worker_manager . DEBUG . Number = 1
[date] . worker_manager . DEBUG . Number = 2
[date] . worker_manager . DEBUG . Number = 3
[date] . worker_manager . DEBUG . Number = 4
[date] . worker_manager . DEBUG . Number = 5
1

There are 1 answers

0
dragon2fly On

In WorkerManager, I want to use multiprocessing, and I would like the logger to be global (but this is becoming a bad architecture, it seems to me). What should I do in such a situation to use one logger?

Not just for multiprocessing, but the best practice is to separate the logging setup from the main code. ie, configure the logging from a file (json, yaml, ...) and use dictConfig. This removes a lot of headaches caused by misconfiguring the loggers and helps you concentrate more on the actual problem of the main code.

The code below does not work because multiprocessing creates a copy of the logger (since I use 5 processes, it will create 5 different loggers). I have tried to narrow down the problem I am facing as much as possible.

Yes, the problem quickly gets complicated with multiprocessing.

Logging works just fine only if you use fork method to create a new process. With spam method, the logger must be passed into the child processes. Since the logger is not picklable, only its name got passed and branch new loggers with the same name were created in the child process as you have noticed. With the default params, these loggers only output to the console with level WARNING or above.

If you want to go down this rabbit hole of logger and multiprocessing, the standard docs already have a nice example to boost your start. The basic idea is to serialize the log messages coming from all child processes and use a central process/thread to do the actual logging.

Then, to have a useful logger, you still have to make all processes use the same log config, and even log uncaught exceptions (crash) in the child processes, too.

Or use logger_tt, configure your logging once, and just focus on your main application for the rest of your time.