Is it possible to preallocate an array using gpuArray, and have write permission to it in a mexcuda setup?

Question

Is it possible to preallocate an array using gpuArray, and have write permission to it in a mexcuda setup?

320 views Asked by Floris At 05 December 2024 at 05:42

I have written a piece of code in MatLab (2018a) that is a hybrid between standard matlab code and CUDA code, which I have linked using compilation with mexcuda. The core loop in my code contains an interpolation of a matrix, say from size [n x m] to [N x M]. I have sped up this part using the GPU. Since this interpolation is within a loop, and since the sizes of the matrices I interpolate (both before and after) are the same in each iteration of the loop, I want to speed up the application by preallocating an array of the output size on the GPU. So I want to do something like: zeros(N,M,'gpuArray') once at the start, provide it as input to the mexFunction, and write the interpolated matrix to this array. This would save quite a bit of allocation time ( [N_iterations-1]*allocation_time, roughly speaking ).

My problem now is: I cannot figure out if this is possible. Using a mexFunction() as the entry point, the only way I know to retrieve input arrays is using something along the lines of:

mxGPUArray const *in  = mxGPUCreateFromMxArray(prhs[0]);
float const *dev_in  = (float const *)(mxGPUGetDataReadOnly(in));

but, as the name suggests, this results in read-only permission. I cannot use mxGPUGetData(in) because the mxGPUArray is const, one cannot initialise a non-const entity with it. Does anyone know if there is a way around this issue that does not involve allocation of the array inside the mexFunction?

EDIT:

The code below shows two C code samples, where the first is an analogy for my current code, and the second is what I am striving for:

Current:

#include "stdio.h"
int main(const int argc, const char *argv[]) {
// Allocate input matrix and fill from input arguments
FILE *fPtr; fPtr = fopen(argv[1],"rb");
double *mat_in = malloc(n*m*sizeof(*mat_in));
mat_in = fread(mat_in, sizeof(*mat_in), n*m, fPtr);
fclose(fPtr);

double *mat_out;
for (int it = 0, it < 1000, it++) {
    // Allocate output array and fill it;
    mat_out = malloc(N*M*sizeof(*mat_out));
    interpolation_function(mat_in, mat_out);

    // Do stuff with mat_out
    free(mat_out);
}
// Free mat_in, do more stuff and/or exit program

Idea:

#include "stdio.h"
int main(const int argc, const char *argv[]) {
// Allocate input matrix and fill from input arguments
FILE *fPtr; fPtr = fopen(argv[1],"rb");
double *mat_in = malloc(n*m*sizeof(*mat_in));
mat_in = fread(mat_in, sizeof(*mat_in), n*m, fPtr);
fclose(fPtr);

// Allocate output array once at the start:
double *mat_out = malloc(N*M*sizeof(*mat_out));

for (int it = 0, it < 1000, it++) {
    interpolation_function(mat_in, mat_out); // Fills mat_out
    // Do stuff with mat_out here;
}
free(mat_out);
// Free mat_in, do more stuff and/or exit program

The above two are (at least in my mind) an analogy for the following matlab-cuda hybrid code:

Current (matlab); the mexcuda function needs to allocate memory for the interpolation of input(:,:,indx)

accumresult = zeros(N,M);
input = randn(100,100,1000);
for indx = 1:1000
    input_slice = mexcuda_interpolation( input(:,:,indx) );
    accumresult = accumresult + foo( input_slice, other_parameters);
end

Idea: the memory allocation is moved out of the mexcuda function (and thus, out of the core loop), and the mexcuda function only needs to retrieve the pointer to this (writeable) array;

accumresult = zeros(N,M,'gpuArray');
placeholder = zeros(N,M,'gpuArray'); % Memory allocated on GPU once here
input = randn(100,100,1000);
for indx = 1:1000
    accumresult = accumresult + foo( mexcuda_interpolation(input(:,:,indx)), placeholder, other_parameters);
    %mexcuda_interpolation() somehow gets a pointer to the allocated memory which it can write to
end

Note that there is indeed a possibility to parallelise this further: as stated, I am in an intermediate step of parallelising the entire thing.

Original Q&A

There are 1 answers

**Ander Biguri** · Answer 1 · 2018-06-07 13:50:08

For your mex code, use mxGPUCreateGPUArray, instead of mxGPUCreateFromMxArray for memory allocation without initialization.

About your MATLAB code: why are you preallocating? Understand the principles of what you are doing, because you need it to work with GPUs.

In MATLAB, if you don't preallocate, every time you append new data, what MATLAB does under the hood is: create new array with new size, copy data from smaller old array to new. Of course, this is disencouraged, as you are doing unnecessary copies all the time.

In CUDA, this is not possible. Dynamic arrays do not exist. Especially because whatever you are doing does not happen serially, in a for loop, it happens "at the same time". Therefore it is fundamental that you know the size of the output when you do an operation.

So when you have GPU arrays A and B and you operate f() on them, f needs to know the output size. If you are doing in MATLAB C=f(A,B), you do not need to preallocate C (in fact, with this example, neither do you, without GPU computing). MATLAB will be smart enough to do it for you.

So, either you need to see why in the following code preallocating C is a waste of time

A=rand(N,M); % or A=rand(N,M,'gpuArray');
B=rand(N,M); % or B=rand(N,M,'gpuArray');
C=A+B;

Or alternatively you have a code that looks like:

A=rand(N,M,'gpuArray');
B=rand(N,M,'gpuArray');
for ii=1:N
   for jj=1:M
       C(ii,jj)=A(ii,jj)+B(ii,jj);
   end
end

Which fundamentally means that you are not getting any benefit from GPU computing as you are separating the parallelizable elements.

TechQA.

Is it possible to preallocate an array using gpuArray, and have write permission to it in a mexcuda setup?

There are 1 answers

Related Questions in MATLAB

Related Questions in CUDA

Related Questions in MEX

Related Questions in PRE-ALLOCATION

Related Questions in GPUARRAY

Popular Questions

Popular Tags

Trending Questions