Functional keyword for a unified memory allocation cuda

170 views Asked by At

I am starting out with CUDA programming and as a beginning to implementing a particle integrator, I made an integrator class which holds data about particles and should be able to integrate it. The data comes from another container class, and I want to allocate this data on the unified memory. For this purpose, I have a member function '_allocate', all it does is call cudaMallocManaged for the member variables. Now I was wondering in what kind of functional keyword I should wrap this function.

I read that you cannot use 'global' in a class definition, right now I am using both host and device, since unified memory should be available to both host and device, but I'm not sure if this is the correct way.

This is the class I'd like to implement this in:


template <typename T>
class Leapfrog : public Integrator<T> {
  public:

   ...

  private:
    T *positions; 
    T *masses; 
    T *velocities; 
    T *types; 
    __device__ __host__ bool _allocate();
    __device__ __host__ bool _free();
    __device__ __host__ bool _load_data();
};

// allocates space on the unified memory for the 
// private variables positions, masses, velocities, types

template <typename T>
__host__ __device__ void Leapfrog<T>::_allocate(){
  cudaMallocManaged(&positions, particleset.N*3*sizeof(T));
  cudaMallocManaged(&masses, particleset.N*sizeof(T));
  cudaMallocManaged(&velocities, particleset.N*3*sizeof(T));
  cudaMallocManaged(&types, particleset.N*sizeof(T));
}

I don't know if this is relevant for the functional keyword, but I also want to check cudaError after the allocation to see if it was successful

1

There are 1 answers

0
Oblivion On BEST ANSWER

Every callable that can be called on device only, should be decorated with __device__. and if host only should be decorated with __host__.

You use __host__ __device__ only for callable that will be called on both host and device.

cudaMallocManaged is host only code:

__host__​cudaError_t cudaMallocManaged ( void** devPtr, size_t size, unsigned int  flags = cudaMemAttachGlobal )
Allocates memory that will be automatically managed by the Unified Memory system.

so your code can only work on host.