I am starting out with CUDA programming and as a beginning to implementing a particle integrator, I made an integrator class which holds data about particles and should be able to integrate it. The data comes from another container class, and I want to allocate this data on the unified memory. For this purpose, I have a member function '_allocate', all it does is call cudaMallocManaged for the member variables. Now I was wondering in what kind of functional keyword I should wrap this function.
I read that you cannot use 'global' in a class definition, right now I am using both host and device, since unified memory should be available to both host and device, but I'm not sure if this is the correct way.
This is the class I'd like to implement this in:
template <typename T>
class Leapfrog : public Integrator<T> {
public:
...
private:
T *positions;
T *masses;
T *velocities;
T *types;
__device__ __host__ bool _allocate();
__device__ __host__ bool _free();
__device__ __host__ bool _load_data();
};
// allocates space on the unified memory for the
// private variables positions, masses, velocities, types
template <typename T>
__host__ __device__ void Leapfrog<T>::_allocate(){
cudaMallocManaged(&positions, particleset.N*3*sizeof(T));
cudaMallocManaged(&masses, particleset.N*sizeof(T));
cudaMallocManaged(&velocities, particleset.N*3*sizeof(T));
cudaMallocManaged(&types, particleset.N*sizeof(T));
}
I don't know if this is relevant for the functional keyword, but I also want to check cudaError after the allocation to see if it was successful
Every callable that can be called on device only, should be decorated with
__device__
. and if host only should be decorated with__host__
.You use
__host__ __device__
only for callable that will be called on both host and device.cudaMallocManaged
is host only code:so your code can only work on host.