SYCL - Cannot find the origin of memory corruption

80 views Asked by At

I've been writing a ray tracer using SYCL for a few weeks but I'm now facing a memory corruption issue and I really can't find where it's coming from.

I'm working on Windows 11 22H2 using the Intel oneAPI Base Toolkit and Visual Studio Community 2022 17.7.6.

On Windows, the output image is visibly broken.

On Ubuntu 20.04 (using the oenAPI Base Toolkit 2023.2.1), the output image looks fine but I assume the corruption is still present as it (almost always) crashes my Intel graphics driver when executing the code on the GPU using the gpu_selector_v (laptop, integrated Iris Xe). On the CPU on Ubuntu, the output image is normal, there doesn't seem to be any corruption issue (it's probably still there, just not showing).

On Ubuntu, I tried using valgrind to monitor the application on the CPU but it couldn't find any memory read/write violation. No errors were reported.

On Windows, DrMemory (valgrind equivalent) crashes (pastebin of the crash here) when executing the code and so doesn't report the source of the error.

Broken output on Windows

This corruption issue only arises when compiling in Release mode with optimizations on (/02 on Visual Studio). There seems to be no issues when in Debug mode or Release with optimizations disabled.

I managed to narrow the issue down to a single line of code in render_kernel.cpp:104 called from render_kernel.cpp:60 :

bool RenderKernel::intersect_scene(const Ray ray, HitInfo& closest_hit_info) const
{
    float closest_intersection_distance = -1.0f;
    bool inter_found = false;

    for (int i = 0; i < m_triangle_buffer_access.size(); i++)
    {
        const Triangle& triangle = m_triangle_buffer_access[i];

        HitInfo hit_info;
        if (triangle.intersect(ray, hit_info))
        {
            if (hit_info.t < closest_intersection_distance || !inter_found)
            {
                closest_intersection_distance = hit_info.t;
                
                ///////////
                closest_hit_info = hit_info; //Problematic line
                ///////////

                inter_found = true;
            }
        }
    }

    return inter_found;
}

On the image above, even the red background has artifacts. If you comment the line that assigns the intersection information to the output 'closest_hit_info' argument (line 18 in the snippet above) you obviously can't see the triangle anymore because the hit information is not updated but the red background also isn't full of artifacts anymore.

Reducing the number of bounces below 4 (render_kernel.h) also makes the issue go away. 4 or 5 bounces always show the artifacts in the output image whereas 3 or less bounces don't show any issue. This makes little to no sense as the scene doesn't even have enough geometry for bounces to happen so the execution should be the same with 2 or more bounces.

The full code base can be found in this repo, on the branch "MinimalCorruption". I tried to make it as minimal as possible.

If more details is needed about my system / installation, feel free to ask.

EDIT (11/14/2023): The issue is being tracked on the Intel forum.

EDIT (01/18/2024): Intel replied to my post on their forum and it seems like this is a compiler issue.

2

There are 2 answers

0
Tom Clabault On BEST ANSWER

Intel replied to my post on their forum and it seems like the issue wasn't related to my code but rather to a bug in the compiler.

They're working on the issue internally.

2
Ronan Keryell On

Have you tried to run on CPU with a pure C++ library implementation like AdaptiveCpp? Then you should be able to use the usual C++ CPU debugging tools like UBsan, ThreadSanitizer, Valgrind, whatever.