I am following this tutorial (which is basically this) in order to use ray tune for hyperparemeter optimization. My model is training fine on the GPU without the optimization but now I want to optimize.
I applied the tutorial to my code but when I try to kick off the thing:
result = tune.run(
train,
resources_per_trial={"gpu": 1},
config=config,
num_samples=10,
scheduler=scheduler,
progress_reporter=reporter,
checkpoint_at_end=False,
)
I'm stuck with:
TuneError: Insufficient cluster resources to launch trial: trial requested 1 CPUs, 1 GPUs, but the cluster has only 6 CPUs, 0 GPUs, 12.74 GiB heap, 4.39 GiB objects (1.0 node:XXX).
But then again, when I take a look at the ray dashboard:
there clearly are both GPUs listed.
Why isn't ray tune seeing my GPUs? How do I make this work?
Specs:
GPU 0: TITAN Xp
GPU 1: GeForce GTX 1080 Ti
CUDA 10.1
Python 3.7
PyTorch 1.7
Debian 9.12
ray tune 1.0.1.post1
//edit:
ray.init(num_gpus=1)
ray.get_gpu_ids()
[]
I would suggest checking out placement_strategy and max_t variables. First can cause freezes dependent on your system specification and second can just exceed total time for computation