I'm trying to get early fragment culling to work, based on the stencil test. My scenario is the following: I have a fragment shader that does a lot of work, but needs to be run only on very few fragments when I render my scene. These fragments can be located pretty much anywhere on the screen (I can't use a scissor to quickly filter out these fragments).
In rendering pass 1, I generate a stencil buffer with two possible values. Values will have the following meaning for pass 2:
- 0: do not do anything
- 1: ok to proceed, (eg. enter the fragment shader, and render)
Pass 2 renders the scene properly speaking. The stencil buffer is configured this way:
glStencilMask(1);
glStencilFunc(GL_EQUAL, 1, 1); // if the value is NOT 1, please early cull!
glStencilOp(GL_KEEP, GL_KEEP, GL_KEEP); // never write to stencil buffer
Now I run my app. The color of selected pixels is altered based on the stencil value, which means the stencil test works fine.
However, I should see a huge, spectacular performance boost with early stencil culling... but nothing happens. My guess is that the stencil test either happens after the depth test, or even after the fragment shader has been called. Why?
nVidia apparently has a patent on early stencil culling: http://www.freepatentsonline.com/7184040.html Is this the right away for having it enabled?
I'm using an nVidia GeForce GTS 450 graphics card. Is early stencil culling supposed to work with this card? Running Windows 7 with latest drivers.
 
                        
Like early Z, early stencil is often done using hierarchical stencil buffering.
There are a number of factors that can prevent hierarchical tiling from working properly, including rendering into an FBO on older hardware. However, the biggest obstacle to getting early stencil testing working in your example is that you've left stencil writes enabled for 1/(8) bits in the second pass.
I would suggest using
glStencilMask (0x00)at the beginning of the second pass to let the GPU know you are not going to write anything to the stencil buffer.There is an interesting read on early fragment testing as it is implemented in current generation hardware here. That entire blog is well worth reading if you have the time.