Initialize struct on different GPUs

621 views Asked by At

I have a struct that looks like this

struct LstmLayer {
  int deviceId;
  thrust::device_vector <real_t> W;
  thrust::device_vector <real_t> gradW;

  LstmLayer() : deviceId(0) {}

  LstmLayer(int __deviceId__) : deviceId(__deviceId__) {}

  void setDevice(int __deviceId__) { deviceId = __deviceId__; }

  void init(bool initParams) {
    W.resize(4*lstmSize * 2*lstmSize);
    gradW.resize(4*lstmSize * 2*lstmSize);

    if (initParams) GPU_Random_Vector(W);
  }
}

Now I want to initialize an array of LstmLayer, with each element on a different GPU device. I do it as follows

  struct LstmLayer lstmLayers[MAX_NUM_LSTM_LAYERS];

  for (int i = 0; i < numLstmLayers; ++i) {
    CUDA_SAFE_CALL(cudaSetDevice(i));
    lstmLayers[i].setDevice(i);
    lstmLayers[i].init(true);
  }

Running this program gives the following error

terminate called after throwing an instance of 'thrust::system::system_error'
  what():  driver shutting down

Please tell me what's wrong with my code and how to do it right? Thank you in advanced.

1

There are 1 answers

0
talonmies On BEST ANSWER

The problem is you are instantiating all the vectors in the same CUDA GPU context, then trying to use them in a different context. This happens because the default constructor for each device_vector gets called when you define the array of structures. To annotate your code:

struct LstmLayer lstmLayers[MAX_NUM_LSTM_LAYERS]; // default constructor for each device vector called here in one GPU context.

for (int i = 0; i < numLstmLayers; ++i) {
    CUDA_SAFE_CALL(cudaSetDevice(i));
    lstmLayers[i].setDevice(i);
    lstmLayers[i].init(true); // Error here, you changed to a different device and called resize
}

The solution is probably to redefine the device vectors as pointers, and explicitly call their constructors in your init method. There are a lot of different ways this could be done, for example:

 struct LstmLayer {
  int deviceId;
  thrust::device_vector <real_t>  * W;
  thrust::device_vector <real_t> * gradW;

  LstmLayer() : deviceId(0) {}

  LstmLayer(int __deviceId__) : deviceId(__deviceId__) {}

  void setDevice(int __deviceId__) { deviceId = __deviceId__; }

  void init(bool initParams) {
    W = new thrust::device_vector<real_t>(4*lstmSize * 2*lstmSize);
    gradW = new thrust::device_vector<real_t>(4*lstmSize * 2*lstmSize);

    if (initParams) GPU_Random_Vector(W);
  }
}

[disclaimer: written in browser, never compiled, use at own risk]

obviously you would need to defined a destructor to prevent memory leaks. There are other possibilities, I'll leave that as an exercise to the reader.