OpenCL : Random CL_MEM_OBJECT_ALLOCATION_FAILURE upon clEnqueueNDRangeKernel

4.5k views Asked by At

I have 5 Kernels, which keeps processing a finite amount of data. Multiple cl_mem objects are created, some which are used only in a single kernel and some which are shared across kernels. I keep getting CL_MEM_OBJECT_ALLOCATION_FAILURE while enqueuing the 3rd Kernel. However, when I reduce the data am getting the error while enqueuing the 4th Kernel (The 3rd Kernel enqueue works fine). There are no errors returned in any of the clCreateBuffer calls. I suspected it to be a memory issue. For the first (larger) set of data, almost 42MB memory (Global Memory) (cl_mem objects) was allocated before the 3rd Kernel enqueue failure. For the second (smaller) set of data, only 1.48MB memory (Global Memory) was allocated before the 4th Kernel enqueue failure. My device capabalities queries yield CL_DEVICE_MAX_MEM_ALLOC_SIZE as 256MByte and CL_DEVICE_GLOBAL_MEM_SIZE 1024MByte. Am allocating much less than these values. Fearing, it could be a problem in the kernel code, I commented out the entire Kernel code, except the parameters and still am getting the same. So am completely lost in understanding this issue. The callback notification function set to the context (in clCreateContext) didn't provide any additional details. Is there any way to get details regarding which memory object allocation failed and for what reasons?

Thanks in advance

Running OpenCL 1.1 These are the device details :

-----------------------------------------------------------
Device Details
-----------------------------------------------------------
  CL_DEVICE_NAME:           GeForce GTX 460
  CL_DEVICE_VENDOR:             NVIDIA Corporation
  CL_DRIVER_VERSION:            340.62
  CL_DEVICE_VERSION:            OpenCL 1.1 CUDA
  CL_DEVICE_OPENCL_C_VERSION:           OpenCL C 1.1 
  CL_DEVICE_TYPE:           GPU
  CL_DEVICE_MAX_COMPUTE_UNITS:      7
  CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS:       3
  CL_DEVICE_MAX_WORK_ITEM_SIZES:    1024 / 1024 / 64
  CL_DEVICE_MAX_WORK_GROUP_SIZE:    1024
  CL_DEVICE_MAX_CLOCK_FREQUENCY:    1350 MHz
  CL_DEVICE_ADDRESS_BITS:       32
  CL_DEVICE_MAX_MEM_ALLOC_SIZE:     256MByte
  CL_DEVICE_GLOBAL_MEM_SIZE:        1024MByte
  CL_DEVICE_ERROR_CORRECTION_SUPPORT:   no
  CL_DEVICE_LOCAL_MEM_TYPE:     local
  CL_DEVICE_LOCAL_MEM_SIZE:     47KByte
  CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE:   64KByte
  CL_DEVICE_QUEUE_PROPERTIES:       CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE
  CL_DEVICE_QUEUE_PROPERTIES:       CL_QUEUE_PROFILING_ENABLE
  CL_DEVICE_IMAGE_SUPPORT:      1
  CL_DEVICE_MAX_READ_IMAGE_ARGS:    128
  CL_DEVICE_MAX_WRITE_IMAGE_ARGS:   8
 -----------------------------------------------------------
2

There are 2 answers

0
vpa1977 On

clCreateBuffer does not create the buffer on the device thus at the buffer creation type you would not get the error. The error will be returned when you call clEnqueueWriteBuffer though it would not help to debug the reason either since the return codes are pretty vague. I would recommend stepping through your application with something like CodeXL or gDEBugger

0
shirleyYim On

I met the same problem. The situation is that I use clEnqueueNDRangekernel in a loop, and I use clCreateBuffer before the clEnqueueNDRangekernel function every time, but I didn't release the buffer, thus leading to the memory overflow after a long time. The resolution is use clCreateBuffer out of the loop and in the loop write the buffer.