Linked Questions

Popular Questions

Ray: How to run many actors on one GPU?

Asked by At

I have only one gpu, and I want to run many actors on that gpu. Here's what I do using ray, following https://ray.readthedocs.io/en/latest/actors.html

  1. first define the network on gpu
class Network():
    def __init__(self, ***some args here***):
        self._graph = tf.Graph()
        os.environ['CUDA_VISIBLE_DIVICES'] = ','.join([str(i) for i in ray.get_gpu_ids()])
        with self._graph.as_default():
            with tf.device('/gpu:0'):
                # network, loss, and optimizer are defined here

        sess_config = tf.ConfigProto(allow_soft_placement=True)
        sess_config.gpu_options.allow_growth=True
        self.sess = tf.Session(graph=self._graph, config=sess_config)
        self.sess.run(tf.global_variables_initializer())
        atexit.register(self.sess.close)

        self.variables = ray.experimental.TensorFlowVariables(self.loss, self.sess)
  1. then define the worker class
@ray.remote(num_gpus=1)
class Worker(Network):
    # do something
  1. define the learner class
@ray.remote(num_gpus=1)
class Learner(Network):
    # do something
  1. train function
def train():
    ray.init(num_gpus=1)
    leaner = Learner.remote(...)
    workers = [Worker.remote(...) for i in range(10)]
    # do something

This process works fine when I don't try to make it work on gpu. That is, it works fine when I remove all with tf.device('/gpu:0') and (num_gpus=1). The trouble arises when I keep them: It seems that only learner is created, but none of the workers is constructed. What should I do to make it work?

Related Questions