I'm new to SYCL and C++. This is my kernel for simple matrix multiplication using 2D std::vector
.
void MatrixMulParallel(queue& q,
const std::vector<std::vector<double>>& a_host,
const std::vector<std::vector<double>>& b_host,
std::vector<std::vector<double>>& c_gpu) {
/*
To Multiply: C[M][P] = A[M][N] * B[N][P]
*/
PROFILE_FUNCTION();
try {
size_t M = a_host.size();
size_t N = a_host[0].size();
size_t P = b_host[0].size();
// Create device buffers for A, B, C
buffer a(a_host.data(), range<2>{M, N});
buffer b(b_host.data(), range<2>{N, P});
buffer c(c_gpu.data(), range<2>{M, P});
PROFILE_SCOPE("Starting Multiply on GPU");
std::cout << "GPU::Multiplying A and B into C.\n";
auto e = q.submit([&](handler& h) {
auto A = a.get_access<access::mode::read>(h);
auto B = b.get_access<access::mode::read>(h);
auto C = c.get_access<access::mode::write>(h);
h.parallel_for(range<2>{M, P}, [=](id<2> index) {
// index[0] allows accessing ROW index, index[1] is column index
int row = index[0];
int col = index[1];
auto sum = 0.0;
for (int i = 0; i < N; i++)
sum += A[row][i] * B[i][col]; // Error #1
C[index] = sum; // Error #2
});
});
e.wait();
}
catch (sycl::exception const& e) {
std::cout << "An exception is caught while multiplying matrices.\n";
terminate();
}
}
I get two errors, indicated along the lines:
- Error #1:
invalid operands to binary expression ('const std::vector<double, std::allocator<double>>' and 'const std::vector<double, std::allocator<double>>')
- Error #2:
no viable overloaded '='
I tried looking for errors similar to invalid operands for binary expression (...)
, but none of them seem to help debug my specific case. Maybe because this ain't beginner-friendly.
From what I understood so far, a_host.data()
shows a return type std::vector<double>
(shouldn't it be std::vector< std::vector<double> >
?).
I have tried using std::array
with statically known sizes, and it works.
How can I make this work using 2D std::vector
?
Any help would be appreciated.
A 2D
std::vector<std::vector<T>>
does not have elements stored contiguously in the memory.A better way around would be to declare
std::vector<T>
with sizes M*N, i.e. linear arrays, and operate on them as contiguous blocks.Since the destination vector
C
, is supposed to be 2D, create a kernel that indexes in both rows and cols. SYCLindex
actually fills up in linearly-accessible memory blocks.Here's what I did to make it work using
std::vector
: