cg: Vertex output struct corrupted by different member order? Profile violation or cg bug?

150 views Asked by At

I've been tinkering with cg shaders for Retroarch, and I've encountered what appears to be a strange bug in the Cg Toolkit's compiler or code generator...or something. Consider the three-pass shader found here which simulates a CRT TV: https://github.com/libretro/common-shaders/tree/master/crt/crt-interlaced-halation

In particular, consider the final pass: https://github.com/libretro/common-shaders/blob/master/crt/crt-interlaced-halation/crt-interlaced-halation-pass2.cg

As it stands, the shader output works as expected. If you comment out the "#define CURVATURE" at the top of this file (which simulates the curvature of a CRT TV), the shader output also works as expected. However, it's very particular to the member order of the vertex shader output struct here:

struct out_vertex {
    float4 position : POSITION;
    float4 color : COLOR;
    float2 texCoord : TEXCOORD0;
        float2 one;
        float mod_factor;
        float2 ilfac;
        float3 stretch;
        float2 sinangle;
        float2 cosangle;
};

If you rearrange the order to the following, you will get corrupted output:

struct out_vertex {
    float4 position : POSITION;
    float4 color : COLOR;
    float2 texCoord : TEXCOORD0;
        float2 cosangle;
        float2 one;
        float mod_factor;
        float2 ilfac;
        float3 stretch;
        float2 sinangle;
};

My desktop's nvidia card gives me a black screen with that order, and my laptop's ATI card gives me bizarre artifacts where the texture coordinates seem to be broken (perhaps). The exact nature of the error therefore depends on the GPU or drivers, but the presence of the error is vendor/driver-agnostic...so it appears to be a bug in the cg compiler that causes the varying attributes to become corrupt. There's pretty much no end to the kinds of corruption you can get. For instance, other member rearrangements screw up things like the "mod_factor" variable (storing the x pixel coordinate of the output), which causes the alternate magenta/green pixel tints to get stuck on one or the other, blanketing the entire image with the same tint. Still others cause a black screen except for halation/bloom contribution, etc.

The issue does not occur in this particular shader if you reenable "#define CURVATURE", but it doesn't have anything to do with errors in the "flat" codepath: In fact, in the part of the fragment shader within the "#ifdef CURVATURE" block, you can actually replace the final value with "xy = VAR.texCoord;" (the same value used by the uncurved version), and you'll get flat output without any errors. (EDIT: Oops, this isn't actually true with this particular shader, but it was in my own version. I should have checked that first before making the same assessment about this "simplified" example.) In reality, the fact that the flat codepath triggers the corruption but the curved codepath doesn't seems to indicate it has something to do with the curved codepath reading more of the varying attributes in the fragment shader (and maybe the read order or usage matters too...?), but I have not yet found a rhyme or reason to it. I have my own drastically different forked WIP where the same bizarre issues affect a curved codepath as well, but I'd rather keep it to myself until it's ready anyway.

So, I guess I have a few questions:

  • Has anyone else seen anything like this?
  • Is this nondeterminism simply expected with output struct members that aren't explicitly associated with any semantics?
  • Could this corruption be coming from cg shader profile limits I'm unaware of? I have no idea what shader profile Retroarch compiles for, but I can see this kind of corruption occurring if the size of the vertex output struct exceeds some maximum allowed size.
  • Are there any other possibilities I might be overlooking? I considered driver errors, but that went out the window once I realized it affected both nvidia and ATI hardware. Still, I want to do my homework before informing nvidia the Cg Toolkit seems to have a bug...

Thanks for any insights! :)

1

There are 1 answers

0
Mike S On BEST ANSWER

It turns out the problem has everything to do with relying on cg's auto-assigned semantics. I'll copy/paste my comment from above:

I'm starting to think the problem might have something to do with relying on cg to auto-assign semantics: If for instance cg associates a value with a full float range to a semantic that clamps to [0.0, 1.0], that would obviously cause issues. mod_factor, ilfac, and stretch would all fall into that category, and sinangle and cosangle could be in [-1, 1], so the same probably applies to them. The assignment of semantics is likely to be affected by dead code elimination, which would explain the differences with and without "#define CURVATURE." I'll have to test this hypothesis though...

There are only a limited number of semantics available depending on the profile (see this specification), and (I may be mistaken) Retroarch appears to use a lower profile, where only the following are available:

  • POSITION: must be set to the clipspace vertex position, not only because it informs the rasterizer, but also because it apparently can't even be read from the fragment shader.
  • COLOR0 and COLOR1: values are clamped to the [0, 1] range.
  • TEXCOORD0-7: safe for any scalar or vector float value
  • FOG: safe for any scalar float value

The BCOL0/BCOL1 semantics probably clamp too in profiles that support them, and PSIZE and CLP0-5 probably don't. The overall lesson seems to be that letting the cg compiler auto-assign semantics for values outside of the [0, 1] range is like playing Russian roulette, because you never know if they'll end up being associated with the clamped semantics or not, and the auto-assignment will change depending on the specifics of the shader code. For that reason, you need to carefully manage semantics so values potentially outside of [0, 1] get paired up with something like TEXCOORD0-7 or FOG (for a scalar float).