Working around pyopencl array offset limitation

483 views Asked by At

Is there a way to work around the limitation in PyOpenCL whereby:

array.data

fails with

pyopencl.array.ArrayHasOffsetError: The operation you are attempting does not yet support arrays that start at an offset from the beginning of their buffer.

I tried:

a.base_data[a.offset: a.offset + a.nbytes]

This seems to work sometimes, but other times I get:

pyopencl.LogicError: clCreateSubBuffer failed: invalid value

2

There are 2 answers

2
Pavan Yalamanchili On

clcreateSubBuffer needs to have the offset (or in this case it is called the origin) that is aligned, and the size + origin to fall within the limits of the buffer.

CL_INVALID_VALUE is returned in errcode_ret if the region specified by (origin, size) is out of bounds in buffer.

CL_MISALIGNED_SUB_BUFFER_OFFSET is returned in errcode_ret if there are no devices in context associated with buffer for which the origin value is aligned to the CL_DEVICE_MEM_BASE_ADDR_ALIGN value.

For the particular error you are seeing it looks like either your program or pyopencl is miscalculating the size of the array after the offset. Even if you fixed this you may still have problems if the original offset is not aligned to CL_DEVICE_MEM_BASE_ADDR_ALIGN.

Having said that NVIDIA seems to break from spec and allow arbitrary offsets. So your mileage may vary depending on the hardware.

3
hunse On

If you're just looking to get a buffer that marks the start of the array data, to pass to a kernel, you don't have to worry about the size. Here's a function that gets a size-1 buffer that points to the start of the offset data:

def data_ptr(array):
    if array.offset:
        return array.base_data.get_sub_region(array.offset, 1)
    else:
        return array.data

You can use this to pass to a kernel, if you need a pointer to the start of the offset data. Here's an example, where I want to set a sub-region clV of array clA to the value 3. I use data_ptr to get a pointer to the start of clV's data.

import numpy as np
import pyopencl as cl
import pyopencl.array
ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)

m, n = 5, 5
A = np.random.uniform(size=(m, n)).astype(np.float32)
clA = cl.array.Array(queue, A.shape, A.dtype)
clA.set(A)

clV = clA[1::2, 1::2]

def data(array):
    if array.offset:
        return array.base_data.get_sub_region(array.offset, 1)
    else:
        return array.data

source = """
__kernel void fn(long si, long sj, __global float *Y)
{
    const int i = get_global_id(0);
    const int j = get_global_id(1);
    Y[i*si + j*sj] = 3;
}
"""
kernel = cl.Program(ctx, source).build().fn
gsize = clV.shape
lsize = None

estrides = np.array(clV.strides) / clV.dtype.itemsize
kernel(queue, gsize, lsize, estrides[0], estrides[1], data_ptr(clV))

print(clA.get())