Rendering in DirectX 11

2.9k views Asked by At

When frame starts, I do my logical update and render after that. In my render code I do usual stuff. I set few states, buffors, textures, and end by calling Draw.

m_deviceContext->Draw(
        nbVertices,
        0);

At frame end I call present to show rendered frame.

// Present the back buffer to the screen since rendering is complete.
if(m_vsync_enabled)
{
    // Lock to screen refresh rate.
    m_swapChain->Present(1, 0);
}
else
{
    // Present as fast as possible.
    m_swapChain->Present(0, 0);
}

Usual stuff. Now, when I call Draw, according to MSDN

Draw submits work to the rendering pipeline.

Does it mean that data is send to GPU and main thread (the one called Draw) continues? Or does it wait for rendering to finish?

In my opinion, only Present function should make main thread wait for rendering to finish.

1

There are 1 answers

0
Chuck Walbourn On BEST ANSWER

There are a number of calls which can trigger the GPU to start working, Draw being one. Other's include Dispatch, CopyResource, etc. What the MSDN docs are trying to say is that stuff like PSSetShader. IASetPrimitiveTopology, etc. doesn't really do anything until you call Draw.

When you call Present that is taken as an implicit indicator of 'end of frame' but your program can often continue on with setting up rendering calls for the next frame well before the first frame is done and showing. By default, Windows will let you queue up to 3 frames ahead before blocking your CPU thread on the Present call to let the GPU catch-up--in real-time rendering you usually don't want the latency between input and render to be really high.

The fact is, however, that GPU/CPU synchronization is complicated and the Direct3D runtime is also batcning up requests to minimize kernel-call overhead so the actual work could be happing after many Draws are submitted to the command-queue. This old article gives you the flavor of how this works. On modern GPUs, you can also have various memory operations for paging in memory, setting up physical video memory areas, etc.

BTW, all this 'magic' doesn't exist with Direct3D 12 but that means the application has to do everything at the 'right' time to ensure it is both efficient and functional. The programmer is much more directly building up command-queues, triggering work on various pixel and compute GPU engines, and doing all the messy stuff that is handled a little more abstracted and automatically by Direct3 11's runtime. Even still, ultimately the video driver is the one actually talking to the hardware so they can do other kinds of optimizations as well.

The general rules of thumb here to keep in mind:

  • Creating resources is expensive, especially runtime shader compilation (by HLSL complier) and runtime shader blob optimization (by driver)
  • Copying resources to the GPU (i.e. loading texture data from the CPU memory) requires bus bandwidth that is limited in supply: Prefer to keep textures, VB, and IB data in Static buffers you reuse.
  • Copying resources from the GPU (i.e. moving GPU memory to CPU memory) uses a backchannel that is slower than going to the GPU: try to avoid the need for readback from the GPU
  • Submitting larger chunks of geometry per Draw call helps to amortize overhead (i.e. calling draw once for 10,000 triangles with the same state/shader is much faster than calling draw 10 times for a 1000 triangles each with changing state/shaders between).