I'm trying to make a global linked list for Forward+ Shading, but there's some
difficulties during the implementation.
For each group in compute shader, it has a shared variable: a local index array, and this array has variable length(capacity is constant, but the content is not). Here's the example:
shared int array[1024]; // the declaration in GLSL
shared int length; // it is also a shared variable in work group.
Group 0: length = 4, array = 3, 5, 7, 1, -1, -1, -1, -1.... (-1 = not valid)
Group 1: length = 2, array = 1, 6, -1, -1, -1....
Group 2: length = 1, array = 8, -1, -1, -1, -1....
Now I want to merge these indices into a global index array. i.e. a Shader Storage Buffer Object. The order is based on group index:
Global index array: 3, 5, 7, 1, 1, 6, 8, -1, -1, -1 ......
The difficulty is, I don't know how to sync between different groups. Since barrier() in OpenGL only guarantees synchronization in the same group.
And I found the in other post said OpenGL doesn't support synchronization between different groups.
OpenGL Compute shader sync different work groups
So, my question is. Is there anyway to achieve my goal?
For example, can I declare some Shader Storage Buffer Object, such as the latest group id that has finished updating and the offset of the global index array?
example:
unit latestGroupIDUpdated = -1; // a SSBO
unit globalIdxOffset = 0; // a SSBO
in each group:
while( myGroupId - 1 != latestGroupIDUpdated )
{ //keep waiting }
// my previous group has updated the global list
globalIdxOffset+= myArrayLength;
latestGroupIDUpdated = myGroupId;
//now start appending the local index array into global index array
Will this attempt works? Or will it failed and why?
If it will fail, what kind of approach is advised?
By the look of it, it looks like you organized your LOCAL GROUP in 1D:
If you called
glDispatch(n * X, 1, 1)
, you will have n groups. Process between n groups are also executed in parallel, thus you will not know the order of which eachgroupID
is being updated. Using 'latestGroupIDUpdated
' will not work.Here is my approach, you should utilize the built-in variable
You can use
gl_GlobalInvocationID.x
to index into your 'global
' SSBO list to store 'length
'. Something like:All this is just to store the dynamic value '
length
' in group order in the 'global
' SSBO list. All of these 'length
' will be updated once you calledglMemoryBarrier()
in your C/C++ application.After that you will have to modify this array such that it would store the "Prefix Sum" - inclusive - of the 'length' array. This process is highly parallelized. If you're trying to save time, you can do this in a separate compute shader. (I suggest looking this up if your length array is long). You can also do this on the CPU.
After you have your inclusive prefix sum length array (let's call it PrefixSumLengthArray), you will need to call
glDispatch()
again to dispatch as many shader invocations as the total 'length' value - the last value in yourPrefixSumLengthArray
. Then you will usegl_GlobalInvocationID
to index into your newSSBO
list to store your arrays. Something like:That will synchronize between different groups for ya!!!