To optimize a kernel i need to make a copy of a cl_mem object with an offset.
count_buffer3[n] = count_buffer[n+1]
is the desired result
Looking at the specification of ClEnqueueCopyBuffer it seems to be possible with a simple argument.
cl_int clEnqueueCopyBuffer ( cl_command_queue command_queue,
cl_mem src_buffer,
cl_mem dst_buffer,
size_t src_offset,
size_t dst_offset,
size_t cb,
cl_uint num_events_in_wait_list,
const cl_event *event_wait_list,
cl_event *event)
My idea was to set dst_offset to 1. So copy_buffer[0] goes to copy_buffer[1] In my case the command looks like:
clEnqueueCopyBuffer(command_queue, count_buffer, count_buffer3, 1, 0, (inCount1 + 1) * sizeof(int), NULL, NULL, NULL);
So i want to copy count_buffer to count_buffer3 with an offset of 1. The result should be like this:
count_buffer[1] = 2
count_buffer[2] = 12
count_buffer[3] = 26
count_buffer3[1] = 12
count_buffer3[2] = 26
Unfortunately, if my dst_offset is 1 like shown in the example my complete count_buffer3 object contains only "0" as int values.
If my offset is 0, the copy works fine and both count_buffers are identical.
Additional Information: Here are the init of the clmem objects:
cl_mem count_buffer3 = clCreateBuffer(context, CL_MEM_READ_WRITE, (inCount1 + 1) * sizeof(int), NULL, &err); errWrapper("create Buffer", err);
cl_mem count_buffer = clCreateBuffer(context, CL_MEM_READ_WRITE, (inCount1+1) * sizeof(int), NULL, &err); errWrapper("create Buffer", err);
I am using INtel INDE update 2 with visual Studio 2013
Am i doing sth wrong here, or should the copy with offset work like this?
Edit: i reduced the buffer size by one and the result changes. Instead of all "0" i get some very huge numbers.
example from debug:
count_buffer[0] = 0
count_buffer[1] = 31
count_buffer[2] = 31
count_buffer3[0] = 520093696
count_buffer3[1] = 520093696
count_buffer3[2] = 520093696
It is an improvement to "0" values, but still wrong. any ideas?
Thanks for the answer so far!
The offset is in bytes. You probably want an offset of
sizeof count_buffer[0]
and a size of(n - 1) * sizeof count_buffer[0]
: