I'm trying to process a pydantic object with the multiprocessing toolbox in Python.
My Task: I need to download many files. The url to these files an additional information are stored in an data object, like a boolean "file_downloaded". I created this data object with pydantic. Now I want to download more than one file at once. So I want to make a list of multiple data objects and process them in a Pool with 5 processes and I use the map-function for that.
Here is an simple example (with errors):
import pydantic
from typing import Optional
import multiprocessing
from multiprocessing.managers import BaseManager
class data_object(pydantic.BaseModel):
url: str
downloaded: Optional[bool] = False
class CustomManager(BaseManager):
pass
def downloader(single_data: data_object):
single_data.downloaded = True
if __name__ == '__main__':
# Simple single process test for data_object and worker (no errors)
just_one_object = data_object(url='url1')
print(just_one_object.downloaded)
downloader(just_one_object)
print(just_one_object.downloaded)
# Multiprocesses with shared data_object
CustomManager.register('data_object', data_object)
CustomManager.register('list', list)
with CustomManager() as manager:
shared_single_object = manager.data_object(url='url2') # Error occurs
print(shared_single_object.downloaded)
downloader(shared_single_object)
print(shared_single_object.downloaded)
managed_list = manager.list([manager.data_object(url='url'+str(v)) for v in range(5)])
pool = multiprocessing.Pool(processes=5)
pool.map(downloader, managed_list)
pool.close()
pool.join()
print(managed_list)
When I run this example, I get the following error in line of definition of shared_single_object:
AttributeError: '__signature__' attribute of 'data_object' is class-only
Unfortunately I have no idea, where to start to solve this error. In following I create multiple instances of the data object with different urls and list them in a managed list. Then they should be downloaded. Maybe there is another problem, I wasn't able to run this part.
I searched the internet for using the multiprocessing.manager for a pydantic object, but i found nothing. I tried used an example of sharing an complex class with the manager to implement the code above.
With just dictionary I was able to download multiple files at once, but I'd like to use pydantic.
Thanks in advance.