I am running Autogluon on a T4 GPU (Cuda version 11.8) on Colab. Per the documentation, I've installed:
pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 --index-url https://download.pytorch.org/whl/cu118
pip install autogluon
I'm running the following code:
df = pd.read_csv("/content/autogluon train.csv")
predictor = TabularPredictor(label='Expense').fit(df, presets='best_quality', verbosity=4, time_limit=70000, num_gpus=1)
Below are the latest logs being output from Autogluon:
Fitting model: CatBoost_BAG_L1 ... Training model for up to 51234.31s of the 51234.12s of remaining time.
Dropped 0 of 1149 features.
Dropped 0 of 1149 features.
Fitting CatBoost_BAG_L1 with 'num_gpus': 1, 'num_cpus': 8
Saving AutogluonModels/ag-20231205_035929/models/CatBoost_BAG_L1/utils/model_template.pkl
Loading: AutogluonModels/ag-20231205_035929/models/CatBoost_BAG_L1/utils/model_template.pkl
Upper level total_num_cpus, num_gpus 8 | 1
Dropped 0 of 1149 features.
minimum_model_resources: {'num_cpus': 1, 'num_gpus': 0.5}
user_cpu_per_job, user_gpu_per_job None | None
user_ensemble_cpu, user_ensemble_gpu None | None
Resources info for GpuResourceCalculator: {'resources_per_job': {'cpu': 4, 'gpu': 0.5}, num_parallel_jobs': 2.0, 'batches': 4, 'cpu_per_job': 4, 'gpu_per_job': 0.5}
Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (2.0 workers, per: cpus=4, gpus=0, memory=9.85%)
Dispatching folds on node 04f0fb7d1c58be36abf13e3c6015fbbdc114b5a08ea22cb71ff98a38
Folding resources per job {'num_gpus': 0.5, 'num_cpus': 4}
Folding resources per job {'num_gpus': 0.5, 'num_cpus': 4}
Folding resources per job {'num_gpus': 0.5, 'num_cpus': 4}
Folding resources per job {'num_gpus': 0.5, 'num_cpus': 4}
Folding resources per job {'num_gpus': 0.5, 'num_cpus': 4}
Folding resources per job {'num_gpus': 0.5, 'num_cpus': 4}
Folding resources per job {'num_gpus': 0.5, 'num_cpus': 4}
Folding resources per job {'num_gpus': 0.5, 'num_cpus': 4}
When I look at nvidia-smi, it is showing no processes using the GPU:
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
I'm using latest stable Autogluon 1.0 on Windows 10 with python 3.10.11 And have similar problem: num_gpus=1 has no effect (((
Try to use this form of setting gpus num (helps in my case):