While looking at the native code generated by Unity Burst to see whether it's leveraging SIMD, I have a hard time reading it as it is incompletely shown in the inspector.
For instance, there are variables that simply aren't shown, h1
, h2
, h3
and hv
.
I've modified the attributes so as to avoid any kind of optimization but still the same:
[BurstCompile(OptimizeFor = OptimizeFor.FastCompilation)]
[MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.NoOptimization)]
The Burst inspector:
The actual code:
[BurstCompile(OptimizeFor = OptimizeFor.FastCompilation)]
[MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.NoOptimization)]
[MonoPInvokeCallback(typeof(FilterMethod))]
public static unsafe void VectorFullBandInner(
in float* source, in float* target, in int length, in int stride, in int offset, ref Filter filter)
{
ValidateArguments(source, target, length, stride, offset, ref filter);
var h = filter.H;
var z = filter.Z;
var n = filter.HLength;
var v = filter.VLength;
for (var sample = 0; sample < length; sample += v)
{
var pos = Filter.UpdateZ(ref filter, source, sample);
var sum = 0.0f;
var tap = 0;
int end;
for (end = n - 4; tap < end; tap += 4)
{
var h0 = h[tap + 0];
var h1 = h[tap + 1];
var h2 = h[tap + 2];
var h3 = h[tap + 3];
var zP = pos - tap;
var z0 = z[zP - 0];
var z1 = z[zP - 1];
var z2 = z[zP - 2];
var z3 = z[zP - 3];
var hv = new float4(h0, h1, h2, h3);
var zh = new float4(z0, z1, z2, z3);
sum += math.dot(hv, zh);
}
for (end = n - 1; tap < end; tap += 2)
{
var h0 = h[tap + 0];
var h1 = h[tap + 1];
var zP = pos - tap;
var z0 = z[zP - 0];
var z1 = z[zP - 1];
var hv = new float2(h0, h1);
var zv = new float2(z0, z1);
sum += math.dot(hv, zv);
}
for (end = n - 0; tap < end; tap += 1)
{
var h0 = h[tap];
var zP = pos - tap;
var z0 = z[zP - 0];
sum += math.dot(h0, z0);
}
target[sample] = sum;
}
}
These variables are used and can't be optimized out but they're not shown.
Tried different options in UI, even for full debug information it's the same.
Am I missing something or is this expected?
Burst compiler optimizes struct operations by packing and vectorizing them.
h0
-h3
existing next to each other in memory can be reinterpreted as a single wider "struct" read/move operation. Soh1
,h2
,h3
declarations haven't disappeared but been collapsed to a single instruction in places ofh0
.