Matrix Multiplication on SYCL using 2D std::vector

Question

Matrix Multiplication on SYCL using 2D std::vector

1.5k views Asked by Karan Shah At 31 December 2024 at 17:27

I'm new to SYCL and C++. This is my kernel for simple matrix multiplication using 2D std::vector.


void MatrixMulParallel(queue& q, 
    const std::vector<std::vector<double>>& a_host,
    const std::vector<std::vector<double>>& b_host,
    std::vector<std::vector<double>>& c_gpu) {
    /*
        To Multiply: C[M][P] = A[M][N] * B[N][P]
    */
    PROFILE_FUNCTION();
    try {
        size_t M = a_host.size();
        size_t N = a_host[0].size();
        size_t P = b_host[0].size();
        // Create device buffers for A, B, C
        buffer a(a_host.data(), range<2>{M, N});
        buffer b(b_host.data(), range<2>{N, P});
        buffer c(c_gpu.data(), range<2>{M, P});

        PROFILE_SCOPE("Starting Multiply on GPU");
        std::cout << "GPU::Multiplying A and B into C.\n";
        auto e = q.submit([&](handler& h) {

            auto A = a.get_access<access::mode::read>(h);
            auto B = b.get_access<access::mode::read>(h);
            auto C = c.get_access<access::mode::write>(h);

            h.parallel_for(range<2>{M, P}, [=](id<2> index) {
                // index[0] allows accessing ROW index, index[1] is column index
                
                int row = index[0];
                int col = index[1];
                auto sum = 0.0;
                for (int i = 0; i < N; i++)
                    sum += A[row][i] * B[i][col]; // Error #1
                C[index] = sum; // Error #2
                });
            });
        e.wait();
    }
    catch (sycl::exception const& e) {
        std::cout << "An exception is caught while multiplying matrices.\n";
        terminate();
    }
}

I get two errors, indicated along the lines:

Error #1: invalid operands to binary expression ('const std::vector<double, std::allocator<double>>' and 'const std::vector<double, std::allocator<double>>')
Error #2: no viable overloaded '='

I tried looking for errors similar to invalid operands for binary expression (...), but none of them seem to help debug my specific case. Maybe because this ain't beginner-friendly.

From what I understood so far, a_host.data() shows a return type std::vector<double> (shouldn't it be std::vector< std::vector<double> >?).

I have tried using std::array with statically known sizes, and it works.

How can I make this work using 2D std::vector?

Any help would be appreciated.

Original Q&A

There are 2 answers

Ronan Keryell On 19 October 2020 at 16:33

More generally, avoid non compact data structure when doing HPC. It is less friendly for the memory hierarchy than contiguous array elements and the initialization is complex. Use instead things similar to md_span and md_array (basically Fortran arrays on steroids :-) ).

**Karan Shah** · Accepted Answer · 2020-10-18T10:30:54+00:00

A 2D std::vector<std::vector<T>> does not have elements stored contiguously in the memory.

A better way around would be to declare std::vector<T> with sizes M*N, i.e. linear arrays, and operate on them as contiguous blocks.

Since the destination vector C, is supposed to be 2D, create a kernel that indexes in both rows and cols. SYCL index actually fills up in linearly-accessible memory blocks.

Here's what I did to make it work using std::vector:

template <typename T>
void MatrixMulParallelNaive(queue& q, 
    const std::vector<T>& a_host,
    const std::vector<T>& b_host,
    std::vector<T>& c_gpu) {
    /*
        To Multiply: C[M][P] = A[M][N] * B[N][P]
    */
    PROFILE_FUNCTION();
    try {
        
        buffer<double, 1> a(a_host.data(), range<1>{a_host.size()}); // 1D
        buffer<double, 1> b(b_host.data(), range<1>{b_host.size()}); // 1D
        buffer<double, 2> c(c_gpu.data(), range<2>{M, P}); // Create 2D buffer
        PROFILE_SCOPE("Starting Multiply on GPU");
        std::cout << "GPU::Multiplying A and B into C.\n";
        auto e = q.submit([&](handler& h) {

            auto A = a.get_access<access::mode::read>(h);
            auto B = b.get_access<access::mode::read>(h);
            auto C = c.get_access<access::mode::write>(h);
            
            h.parallel_for(range<2>{M, P}, [=](id<2> index) {
                // Threading index that iterates over C.
                int row = index[0];
                int col = index[1];
                auto sum = 0.0;
                // Compute result of ONE element of C
                for (int i = 0; i < N; i++)
                    sum += A[row * M + i] * B[i * N + col];
                C[index] = sum;
                });
            });
        e.wait();
    }
    catch (sycl::exception const& e) {
        std::cout << "An exception is caught while multiplying matrices.\n";
        terminate();
    }
}

TechQA.

Matrix Multiplication on SYCL using 2D std::vector

There are 2 answers

Related Questions in C++

Related Questions in VECTOR

Related Questions in SYCL

Related Questions in INTEL-ONEAPI

Related Questions in DPC++

Popular Questions

Popular Tags

Trending Questions