Random exit code on giving larger array sizes in DPC++ Vector Addition

I am trying to run a hello-world DPC++ sample of oneAPI which adds two 1-D Arrays on both CPU and GPU, and verifies the results. Code is shown below:

DataParallel Addition of two Vectors

#include <CL/sycl.hpp>
#include <array>
#include <iostream>
using namespace sycl;

constexpr size_t array_size = 100000;
typedef std::array<int, array_size> IntArray;

// Initialize array with the same value as its index
void InitializeArray(IntArray& a) { for (size_t i = 0; i < a.size(); i++) a[i] = i; }

Create an asynchronous Exception Handler for sycl
static auto exception_handler = [](cl::sycl::exception_list eList) {
    for (std::exception_ptr const& e : eList) {
        try {
        catch (std::exception const& e) {
            std::cout << "Failure" << std::endl;

void VectorAddParallel(queue &q, const IntArray& x, const IntArray& y, IntArray& parallel_sum) {
    range<1> num_items{ x.size() };
    buffer x_buf(x);
    buffer y_buf(y);
    buffer sum_buf(parallel_sum.data(), num_items);

    Submit a command group to the queue by a lambda
    which contains data access permissions and device computation
    q.submit([&](handler& h) {

        auto xa = x_buf.get_access<access::mode::read>(h);
        auto ya = y_buf.get_access<access::mode::read>(h);
        auto sa = sum_buf.get_access<access::mode::write>(h);

        std::cout << "Adding on GPU (Parallel)\n";
        h.parallel_for(num_items, [=](id<1> i) { sa[i] = xa[i] + ya[i]; });
        std::cout << "Done on GPU (Parallel)\n";

    queue runs the kernel asynchronously. Once beyond the scope,
    buffers' data is copied back to the host.

int main() {
    default_selector d_selector;
    IntArray a, b, sequential, parallel;


    try {
        // Queue needs: Device and Exception handler
        queue q(d_selector, exception_handler);
        std::cout << "Accelerator: " 
                  << q.get_device().get_info<info::device::name>() << "\n";
        std::cout << "Vector size: " << a.size() << "\n";
        VectorAddParallel(q, a, b, parallel);
    catch (std::exception const& e) {
        std::cout << "Exception while creating Queue. Terminating...\n";
    Do the sequential, which is supposed to be slow
    std::cout << "Adding on CPU (Scalar)\n";
    for (size_t i = 0; i < sequential.size(); i++) {
        sequential[i] = a[i] + b[i];
    std::cout << "Done on CPU (Scalar)\n";
    Verify results, the old-school way
    for (size_t i = 0; i < parallel.size(); i++) {
        if (parallel[i] != sequential[i]) {
            std::cout << "Fail: " << parallel[i] << " != " << sequential[i] << std::endl;
            std::cout << "Failed. Results do not match.\n";
            return -1;
    std::cout << "Success!\n";
    return 0;

With a relatively small array_size, (I tested 100-50k elements) the computation works out to be fine. Sample output:

Accelerator: Intel(R) Gen9
Vector size: 50000
Adding on GPU (Parallel)
Done on GPU (Parallel)
Adding on CPU (Scalar)
Done on CPU (Scalar)

It can be noted that it takes barely a second to finish the computation on both CPU and GPU. But when I increase the array_size, to say, 100000, I get this seemingly clueless error:

C:\Users\myuser\source\repos\dpcpp-iotas\x64\Debug\dpcpp-iotas.exe (process 24472) exited with code -1073741571.

Although I am not sure at what precise value the error starts occurring, but I seem to be sure it happens after around 70000. I seem to have no idea why this is happening, any insights on what can be wrong?


Turns out, this is due to Stack size reinforcement by VS. Contiguous array with too many elements resulted in a stack overflow.

As mentioned by @user4581301, the error code -107374171 in hex, gives C00000FD, which is a signed representation of 'stack exhaustion/overflow' in Visual Studio.

Ways to fix this:

  1. Increase the /STACK reserve to something higher than 1MB (this is the default) in the Project Properties > Linker > System > Stack Reserve/Commit values.
  2. Use a binary editor (editbin.exe and dumpbin.exe) to edit /STACK:reserve.
  3. Use std::vector instead, which allows dynamic allocation (suggested by @Retired Ninja).

I couldn't find an option to change /STACK in oneAPI, the normal way in Linker properties, shown here.

I decided to go with dynamic allocation.

Ronan Keryell On

When I program big applications I always do a

ulimit -s unlimited

to explain to the shell that I am grown up and I really want some space on my stack.

Here this is the bash syntax but you can obviously adapt to some other shells.

I guess there might be an equivalent for non-UNIX OS?