I have 5 Kernels, which keeps processing a finite amount of data. Multiple cl_mem objects are created, some which are used only in a single kernel and some which are shared across kernels. I keep getting CL_MEM_OBJECT_ALLOCATION_FAILURE while enqueuing the 3rd Kernel. However, when I reduce the data am getting the error while enqueuing the 4th Kernel (The 3rd Kernel enqueue works fine). There are no errors returned in any of the clCreateBuffer calls. I suspected it to be a memory issue. For the first (larger) set of data, almost 42MB memory (Global Memory) (cl_mem objects) was allocated before the 3rd Kernel enqueue failure. For the second (smaller) set of data, only 1.48MB memory (Global Memory) was allocated before the 4th Kernel enqueue failure. My device capabalities queries yield CL_DEVICE_MAX_MEM_ALLOC_SIZE as 256MByte and CL_DEVICE_GLOBAL_MEM_SIZE 1024MByte. Am allocating much less than these values. Fearing, it could be a problem in the kernel code, I commented out the entire Kernel code, except the parameters and still am getting the same. So am completely lost in understanding this issue. The callback notification function set to the context (in clCreateContext) didn't provide any additional details. Is there any way to get details regarding which memory object allocation failed and for what reasons?
Thanks in advance
Running OpenCL 1.1 These are the device details :
-----------------------------------------------------------
Device Details
-----------------------------------------------------------
CL_DEVICE_NAME: GeForce GTX 460
CL_DEVICE_VENDOR: NVIDIA Corporation
CL_DRIVER_VERSION: 340.62
CL_DEVICE_VERSION: OpenCL 1.1 CUDA
CL_DEVICE_OPENCL_C_VERSION: OpenCL C 1.1
CL_DEVICE_TYPE: GPU
CL_DEVICE_MAX_COMPUTE_UNITS: 7
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
CL_DEVICE_MAX_WORK_ITEM_SIZES: 1024 / 1024 / 64
CL_DEVICE_MAX_WORK_GROUP_SIZE: 1024
CL_DEVICE_MAX_CLOCK_FREQUENCY: 1350 MHz
CL_DEVICE_ADDRESS_BITS: 32
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 256MByte
CL_DEVICE_GLOBAL_MEM_SIZE: 1024MByte
CL_DEVICE_ERROR_CORRECTION_SUPPORT: no
CL_DEVICE_LOCAL_MEM_TYPE: local
CL_DEVICE_LOCAL_MEM_SIZE: 47KByte
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 64KByte
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE
CL_DEVICE_IMAGE_SUPPORT: 1
CL_DEVICE_MAX_READ_IMAGE_ARGS: 128
CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 8
-----------------------------------------------------------
clCreateBuffer does not create the buffer on the device thus at the buffer creation type you would not get the error. The error will be returned when you call clEnqueueWriteBuffer though it would not help to debug the reason either since the return codes are pretty vague. I would recommend stepping through your application with something like CodeXL or gDEBugger