I was trying the new PGI community release (17.4) with a toy example (see below) and I'm getting an error inside the CUDA driver api when calling acc_init
.
The code to reproduce the error is:
#include <openacc.h>
#include <cuda_runtime_api.h>
#include <stdio.h>
int main()
{
acc_init( acc_device_nvidia );
int ndev = acc_get_num_devices( acc_device_nvidia );
printf("Num OpenACC devices: %d\n", ndev);
cudaGetDeviceCount(&ndev);
printf("Num CUDA devices: %d\n", ndev);
return 0;
}
Compiled with:
/usr/local/pgi/linux86-64/17.4/bin/pgcc -acc -ta=tesla -Mcuda ./test.c -o oacc_test.pgi
cuda memcheck output:
$ cuda-memcheck ./oacc_test.pgi
========= CUDA-MEMCHECK
========= Program hit CUDA_ERROR_INVALID_DEVICE (error 101) due to "invalid device ordinal" on CUDA API call to cuDevicePrimaryCtxRetain.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuDevicePrimaryCtxRetain + 0x15c) [0x1e8d1c]
========= Host Frame:/usr/local/pgi/linux86-64/17.4/lib/libaccnc.so (__pgi_uacc_cuda_initdev + 0x80b) [0x6f0b]
========= Host Frame:/usr/local/pgi/linux86-64/17.4/lib/libaccg.so (__pgi_uacc_enumerate + 0x148) [0x11388]
========= Host Frame:/usr/local/pgi/linux86-64/17.4/lib/libaccg.so (__pgi_uacc_initialize + 0x5b) [0x117ab]
========= Host Frame:/usr/local/pgi/linux86-64/17.4/lib/libaccapi.so (acc_init + 0x22) [0xe4f2]
========= Host Frame:./oacc_test.pgi [0xbc4]
========= Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xf1) [0x202b1]
========= Host Frame:./oacc_test.pgi [0xaca]
=========
Num OpenACC devices: 1
Num CUDA devices: 1
========= ERROR SUMMARY: 1 error
Apparently __pgi_uacc_cuda_initdev
is passing a '-1' as the second parameter (CUdevice dev) to cuDevicePrimaryCtxRetain
(bug?):
Breakpoint 1, 0x00007ffff4ab0bc0 in cuDevicePrimaryCtxRetain () from /usr/lib/x86_64-linux-gnu/libcuda.so
(cuda-gdb) p /x $rsi
$7 = 0xffffffff
I suppose this isn't normal. Is this a bug of 17.4 or is my installation broken?
It's normal and a benign error. Basically what's happening is the PGI runtime is querying if there's already a CUDA context created. But since there isn't CUDA runtime call to just query the existence of a context, we call "cuDevicePrimaryCtxRetain". If it errors, then we know that we need to create a new context.
Note that in PGI release 17.7 we did change this call a bit so you will no longer see the error when running cuda-memcheck.