I'm currently training a Top2Vec ML model on a CommonCrawl news dataset in Azure ML Studio. When running my Python code inside a ipynb Notebook in ML Studio itself (online) the CPU is being fully used (100% workload) but when executing my script as a task in a job the CPU utilization (Monitoring) does not get over 25%.
I've noticed that the section "containerInstance" in the full JSON job definition contains the resource settings for this container instance which is configured the following way:
"containerInstance": {
"region": null,
"cpuCores": 2,
"memoryGb": 3.5
}
However, I'm somehow not able to launch a job with more than 2 cpuCores and 3.5 GB RAM. My compute machine is a STANDARD_F4S_V2 instance with 4 vCPUs and 8 GB RAM. So I'm expecting that my container instance uses all available resources instead of only 50%.
This are my hyperparamters which I use to train my model:
hyperparameters = {
'min_count': 50,
'topic_merge_delta': 0.1,
'embedding_model': 'doc2vec',
'embedding_batch_size': 32,
'split_documents': False,
'document_chunker': 'sequential',
'chunk_length': 100,
'chunk_overlap_ratio': 0.5,
'chunk_len_coverage_ratio': 1,
'speed': 'learn',
'use_corpus_file': False,
'keep_documents': True,
'workers': 4,
'verbose': True
}
Is there a possibilty to edit the containerInstance options? I saw that I can config "Process count per node" but that sounds like how many times my script should be executed in parallel.
I finally got to the root of the problem. It was not due to the Docker container instance not using all cores, but due to my Python script. My script relied on Python's
threadinglibrary to ensure parallel execution, but at the time I was unaware of the GIL (Global Interpreter Lock) that allows only one thread to hold the control of the Python interpreter, which of course threw off my understanding of threads in Python a bit. After rewriting my script with themultiprocessinglibrary, the Docker container instance then used all available resources.Nonetheless, if you plan to manually define the number of CPU cores and the amount of RAM you can use the Python script below to start your custom Azure ML Job: