how to get the available memory on the device

3.3k views Asked by At

I'm trying to obtain how much free memory I have on the device. To do this I call the cuda function cuMemGetInfo from a fortran code, but it returns negative values for the free amount of memory, so there's clearly something wrong. Does anyone know how I can do that? Thanks

EDIT:

Sorry, in fact my question was not very clear. I'm using OpenACC in Fortran and I call the C++ cuda function cudaMemGetInfo. Finally I could fix the code, the problem was effectively the kind of variables that I was using. Switching to size_ fixed everything. This is the interface in fortran that I'm using:

interface
subroutine get_dev_mem(total,free) bind(C,name="get_dev_mem")
    use iso_c_binding
        integer(kind=c_size_t)::total,free
end subroutine get_dev_mem
end interface

and this the cuda code

#include <cuda.h>
#include <cuda_runtime.h>

extern "C" {
void get_dev_mem(size_t& total, size_t& free) 
{
    cuMemGetInfo(&free, &total);
}
}

There's one last question: I pushed an array on the gpu and I checked its size using cuMemGetInfo, then I computed it's size counting the number of bytes, but I don't have the same answer, why? In the first case it is 3052mb large, in the latter 3051mb. This difference of 1mb could be the size of the array descriptor? Here there's the code that I used:

integer, parameter:: long = selected_int_kind(12)
integer(kind=c_size_t) :: total, free1,free2
real(8), dimension(:),allocatable::a
integer(kind=long)::N, eight, four

allocate(a(four*N))

!some OpenACC stuff in order to init the gpu
call get_dev_mem(total,free1)

!$acc data copy(a)

call get_dev_mem(total,free2) 
print *,"size a in the gpu = ",(free1-free2)/1024/1024, " mb"
print *,"size a in theory  = ", (eight*four*N)/1024/1024, " mb"

!$acc end data
deallocate(a)
1

There are 1 answers

0
einpoklum On

Right, so, like commenters have suggested, we're not sure exactly what you're running, but filling in the missing details by guessing, here's a shot:

Most CUDA API calls return a status code (or error code if you will); this is true both in C/C++ and in Fortran, as we can see in the Portland Group's CUDA Fortran Manual:

Most of the runtime API routines are integer functions that return an error code; they return a value of zero if the call was successful, and a nonzero value if there was an error. To interpret the error codes, refer to “Error Handling,” on page 48.

This is the case for cudaMemGetInfo() specifically:

integer function cudaMemGetInfo( free, total )
    integer(kind=cuda_count_kind) :: free, total

The two integers for free and total are cuda_count_kind, which if I am not mistaken are effectively unsigned... anyway, I would guess that what you're getting is an error code. Have a look at the Error Handling section on page 48 of the manual.