Why is Autogluon not using the GPU?

304 views Asked by At

I am running Autogluon on a T4 GPU (Cuda version 11.8) on Colab. Per the documentation, I've installed:

pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 --index-url https://download.pytorch.org/whl/cu118
pip install autogluon

I'm running the following code:

df = pd.read_csv("/content/autogluon train.csv")
predictor = TabularPredictor(label='Expense').fit(df, presets='best_quality', verbosity=4, time_limit=70000, num_gpus=1)

Below are the latest logs being output from Autogluon:

Fitting model: CatBoost_BAG_L1 ... Training model for up to 51234.31s of the 51234.12s of remaining time.
    Dropped 0 of 1149 features.
    Dropped 0 of 1149 features.
    Fitting CatBoost_BAG_L1 with 'num_gpus': 1, 'num_cpus': 8
Saving AutogluonModels/ag-20231205_035929/models/CatBoost_BAG_L1/utils/model_template.pkl
Loading: AutogluonModels/ag-20231205_035929/models/CatBoost_BAG_L1/utils/model_template.pkl
Upper level total_num_cpus, num_gpus 8 | 1
    Dropped 0 of 1149 features.
minimum_model_resources: {'num_cpus': 1, 'num_gpus': 0.5}
user_cpu_per_job, user_gpu_per_job None | None
user_ensemble_cpu, user_ensemble_gpu None | None
Resources info for GpuResourceCalculator: {'resources_per_job': {'cpu': 4, 'gpu': 0.5}, num_parallel_jobs': 2.0, 'batches': 4, 'cpu_per_job': 4, 'gpu_per_job': 0.5}
    Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (2.0 workers, per: cpus=4, gpus=0, memory=9.85%)
Dispatching folds on node 04f0fb7d1c58be36abf13e3c6015fbbdc114b5a08ea22cb71ff98a38
Folding resources per job {'num_gpus': 0.5, 'num_cpus': 4}
Folding resources per job {'num_gpus': 0.5, 'num_cpus': 4}
Folding resources per job {'num_gpus': 0.5, 'num_cpus': 4}
Folding resources per job {'num_gpus': 0.5, 'num_cpus': 4}
Folding resources per job {'num_gpus': 0.5, 'num_cpus': 4}
Folding resources per job {'num_gpus': 0.5, 'num_cpus': 4}
Folding resources per job {'num_gpus': 0.5, 'num_cpus': 4}
Folding resources per job {'num_gpus': 0.5, 'num_cpus': 4}

When I look at nvidia-smi, it is showing no processes using the GPU:

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
1

There are 1 answers

1
alex On

I'm using latest stable Autogluon 1.0 on Windows 10 with python 3.10.11 And have similar problem: num_gpus=1 has no effect (((

Try to use this form of setting gpus num (helps in my case):

predictor = TabularPredictor(label='Expense').fit(df, presets='best_quality', verbosity=4, time_limit=70000, ag_args_fit={'num_gpus': 1})