Here is my situation: I have a RWTexture2D<float4> out which will always be in state D3D12_RESOURCE_STATE_UNORDERED_ACCESS and another RWTexture2D<float4> tex (initialized in state D3D12_RESOURCE_STATE_COPY_SOURCE) and my render loop is like this:
- Clear
outto zero usingClearUnorderedAccessViewFloat - Perform
DispatchRayswhich will read from and write toout - Transition
texfromD3D12_RESOURCE_STATE_COPY_SOURCEtoD3D12_RESOURCE_STATE_UNORDERED_ACCESS - Invoke a compute shader which transform the values in
outand stores the result intex - Transition
texfromD3D12_RESOURCE_STATE_UNORDERED_ACCESStoD3D12_RESOURCE_STATE_COPY_SOURCE
Currently I'm waiting for the GPU to be finished (by a method of the form wait_for_gpu below) after (1.) and again after (2.). I've noticed that the performance of this is rather poor. So my question is: How can I do this better?
I guess it can be made way more efficient by using (resource) barriers. Beyond ResourceBarrier there is now another method Barrier (see https://microsoft.github.io/DirectX-Specs/d3d/D3D12EnhancedBarriers.html) and I'm quite lost about how I should use them here.
I've tried something like the following:
D3D12_RESOURCE_BARRIER resource_barrier;
resource_barrier.Flags = D3D12_RESOURCE_BARRIER_FLAG_NONE;
resource_barrier.Type = D3D12_RESOURCE_BARRIER_TYPE_UAV;
resource_barrier.UAV.pResource = out;
command_list->ResourceBarrier(1, &resource_barrier);
But does that really do what I want? I clearly only want to make sure that before the ray generation shader is invoked by DispatchRays, the stores from ClearUnorderedAccessViewFloat have finished and it is safe to read those values.
Using Barrier instead, it might be better to specify a D3D12_BARRIER_GROUP with a D3D12_TEXTURE_BARRIER and SyncBefore = D3D12_BARRIER_SYNC_CLEAR_UNORDERED_ACCESS_VIEW (though this enum value doesn't seem to be available in my version of d3d12.h) and SyncAfter = D3D12_BARRIER_SYNC_RAYTRACING.
Unfortunately, I personally think that the documentation is quite poor and I have no idea what would be the best way to use these things here. So, any help is highly appreciated.
void wait_for_gpu()
{
command_queue->Signal(fence, fence_values[frame_index]);
fence->SetEventOnCompletion(fence_values[frame_index], fence_event);
WaitForSingleObjectEx(fence_event, INFINITE, FALSE);
++d3d.fence_values[frame_index];
}