I was trying to run a little game/demo written by a friend of mine for mostly educational purposes, Land of Dreams. I noticed some extremely strange behaviour on my computer, even though the application was reportedly run successfully on several nVidia and ATI GPUs with relatively recent drivers.
The fragment shader is similar to the following:
#version 150
#define SHADOW_MAP_NUM 32
in vec3 w_vPos;
uniform sampler2DArrayShadow uShadowMap;
uniform mat4 uShadowCP[SHADOW_MAP_NUM];
uniform int uNumUsedShadowMaps;
out vec4 vFragColor;
const float kMaxShadow = 0.8;
float Visibility() {
float bias = 0.01;
float visibility = 1.0;
int num_shadow_casters = min(uNumUsedShadowMaps, SHADOW_MAP_NUM);
// For every shadow casters
for(int i = 0; i < num_shadow_casters; ++i) {
vec4 shadowCoord = uShadowCP[i] * vec4(w_vPos, 1.0);
visibility -= kMaxShadow * (1 - texture(
uShadowMap,
vec4( // x, y, slice, depth
shadowCoord.xy, i,
(shadowCoord.z - bias) / shadowCoord.w
)
));
if (visibility <= 0.0) {
return 0.0;
}
}
return visibility;
}
void main() {
vec3 color = vec3(Visibility(w_vPos));
vFragColor = vec4(color, 1.0);
}
Of course, the real fragment shader is more complicated, but Visibility()
is where the problems lie.
In particular, the following artefacts occur when multiple objects cast shadows to the same fragment.
In this case, the if (visibility <= 0.0)
early return is in effect, which is a bit unconventional, but I believe it is valid GLSL.
While experimenting a bit, we managed to remove the artefacts by removing the early return and replacing the last line of Visibility()
with
return max(visibility, 0.0);
Now, an even stranger thing happened: one less shadows were cast then before. In fact, we managed to bring back the shadow by replacing the outer loop with
for(int i = -1; i < num_shadow_casters - 1; ++i) {
Of course, this does no make any sense at all! The uniform array of matrices was filled with data with the exactly same code as before, so there should not be any difference. Indexing a uniform array with negative indices is just plain invalid, anyway.
By adding the following statements to where the early return was, the indices came back to normal:
if (visibility > 99) {
return 0;
}
Of course, the branch above is effectively a no-op, since visibility starts from 1.0 and is always decreased in the loop. The fragment shader behaves the same way with #pragma optimize(off)
, i.e. the bogus branch is needed for proper operation.
The artefacts are only produced on my machine, with the following GPU and driver (relevant lines taken from glxinfo
):
server glx vendor string: VirtualGL
server glx version string: 1.4
OpenGL vendor string: NVIDIA Corporation
OpenGL renderer string: GeForce GT 540M/PCIe/SSE2
OpenGL version string: 4.4.0 NVIDIA 331.20
OpenGL shading language version string: 4.40 NVIDIA via Cg compiler
The server glx is VirtualGL, because I am using Bumblebee/Primus to render with my Optimus graphics card. I run Arch Linux with the 3.11.6 kernel.
Could someone shed some light on what is going one? I tried different nVidia driver versions with the same results, so I think it is improbable that we are facing with a driver bug. What is wrong with the fragment shader above?
UPDATE: I realized the versions of the fragment shader with branches in the inner loop are examples of non-uniform control flow:
If the accessed texture uses mipmapping or anisotropic filtering of any kind, then any texture function that is not "Lod" or "Grad" will retrieve undefined results.
Because our shadow maps use no mip-mapping or texture filtering, I don't think this would apply. What fixes the broken indices is putting back the branch, so maybe we are triggering some driver behaviour connected to non-uniform control flow here?