I have some experience with compute shaders in HLSL. Presently, I'm developing a tool for the Unity engine that does something like texture baking: it takes a low-poly mesh and casts rays from its surface to high-poly.
I've encountered a so-called race condition and don't know how to solve the problem yet.
Algorithm Description:
My shader receives an input 2D image _PositionMap, each pixel of which contains the ray origin coordinates on the low-poly surface. The goal of the shader is to raycast against the high poly surface and fill the buffer named RWStructuredBuffer _HitInfo. It is the same size as _PositionMap, and it stores the high poly surface data at a given pixel: distance from the low poly (depth, initialized with +infinity before the shader is executed), triangle index and barycentric coordinates.
Current implementation:
Threads run in parallel across all _PositionMap pixels. Each thread iterates over all the triangles and checks whether the ray hits it. And if the hit point is closer than what is already recorded in _HitInfo[pixelIndex], it is rewritten. So, there is no race condition. This is what the algorithm looks like now:
#pragma kernel Main
struct HitInfo
{
float depth;
float2 barycentric;
uint triangleIndex;
};
// mesh buffers declarations
Texture2D _PositionMap;
RWStructuredBuffer<HitInfo> _HitInfo;
uint _Width; // position map width (total dispatches along X)
uint _Height; // position map height (total dispatches along Y)
uint _Triangles; // mesh triangle count
[numthreads(16, 16, 1)]
void Main(uint3 id : SV_DispatchThreadID)
{
uint pixelIndex = id.x * _Width + id.y;
// iterating over each triangle in the mesh
for (uint i = 0; i < _Triangles; i++)
{
// performing raycast from _PositionMap[id.xy] against the triangle at index i
HitInfo triangleHitInfo = ...
// performimg depth test
bool depthTest = triangleHitInfo.depth <= _HitInfo[pixelIndex].depth;
if (depthTest)
{
// overwriting previously stored data
_HitInfo[pixelIndex] = triangleHitInfo;
}
}
}
Updated implementation:
I think the performance may be improved if we parallelize the loop using 3rd dimension threads. Now each thread will only perform raycast once, checking a single corresponding triangle.
With this approach, concurrency for reading and writing _HitInfo[pixelIndex] arises. I tried to eliminate it by introducing an additional buffer RWBuffer _ZCounters. It is the same size as _PositionMap and is initialized with zeros before the shader is executed. This is what the updated algorithm looks like:
#pragma kernel Main
struct HitInfo
{
float depth;
float2 barycentric;
uint triangleIndex;
};
// mesh buffers declarations
Texture2D _PositionMap;
RWStructuredBuffer<HitInfo> _HitInfo;
RWBuffer<uint> _ZCounters;
uint _Width; // position map width (total dispatches along X)
uint _Height; // position map height (total dispatches along Y)
uint _Triangles; // mesh triangle count (total dispatches along Z)
[numthreads(8, 8, 8)]
void Main(uint3 id : SV_DispatchThreadID)
{
uint pixelIndex = id.x * _Width + id.y;
// performing a raycast from _PositionMap[id.xy] against the triangle at index id.z
HitInfo triangleHitInfo = ...
// waiting for our turn to access resources
while (true)
{
if (_ZCounters[pixelIndex] == id.z)
{
// CONCURRENCY SAFE AREA BEGINS
// performing a depth test
bool depthTest = triangleHitInfo.depth <= _HitInfo[pixelIndex].depth;
if (depthTest)
{
// overwriting the previously written data
_HitInfo[pixelIndex] = triangleHitInfo;
}
// allowing the next thread to enter this condition block
_ZCounters[pixelIndex]++;
// CONCURRENCY SAFE AREA ENDS
break;
}
}
}
Here I use an infinite loop, inside which all threads constantly read the counter buffer value at the index of their pixel. Only one thread can be inside the if block at a time, and until it changes the counter value, no one should prevent it from reading and writing _HitInfo[pixelIndex]. Therefore, I concluded that data access in this area occurs in an orderly way.
Problem:
Unfortunately, my conclusion was incorrect and this approach did not help me get rid of the race condition. The _HitInfo buffer is filled randomly each time. I just can't figure out where I made a mistake. I can only say that it is unlikely that the shader stops working due to a timeout, because I was able to freeze the computer with endless loops.
I hope they can help me solve this problem.
EDIT
I've changed the location of the break statement a bit and it solved the problem. Look what happened:
Previously, each thread exited the loop immediately after incrementing the counter:
while (true)
{
if (_ZCounters[xyIndex] == id.z)
{
if (triangleHitInfo.depth <= _HitInfo[xyIndex].depth)
{
_HitInfo[xyIndex] = triangleHitInfo;
}
_ZCounters[xyIndex]++;
break;
}
}
Now, each thread continues hanging in the loop. When the counter has finished iterating over all the triangles, all threads exit the loop simultaneously:
while (true)
{
if (_ZCounters[xyIndex] == id.z)
{
if (triangleHitInfo.depth <= _HitInfo[xyIndex].depth)
{
_HitInfo[xyIndex] = triangleHitInfo;
}
_ZCounters[xyIndex]++;
}
if (_ZCounters[xyIndex] == _Triangles)
{
break;
}
}
I'd still be deeply grateful if anyone could point out what was causing this behavior.