I am trying to obtain a faster binary for a specific software by turning on /arch:AVX2, since I am running it on CPUs that should support that instruction set (i7 4770 and i7 4800MQ).
However, doing so produces an executable that crashes with the message "xxx.exe has stopped working", as if I was running it on non-AVX2 hardware.
I know that AVX2 is correctly supported on my system by running y-cruncher, which detects AVX2 hardware and runs the corresponding executable.
The same problem occurs if I specify /arch:AVX.
The code runs normally when built without /arch option. The other used build options are:
/Ox /Ob2 /Oi /Oy /GT /GL /Gm- /EHsc /MD /GS /fp:precise /Zc:wchar_t /Zc:forScope /GR /openmp /Gd /TP /GL /GF /Ot /Qfast_transcendentals
The software in itself does not make use of specific AVX2 intrinsics, as it is designed to run on a broader set of platforms. I am just trying to obtain better performance on my platforms, without changing the code (which I would not even be able to do, as this is a complex piece of software and I am no expert programmer at all).
My question is: why should that option make the program crash on an AVX2-enabled machine? Am I missing something that prevents /arch:AVX2 from working correctly, like e.g. incompatibilities with other flags? (I checked MS docs and did not find any "cross dependency" related to AVX2).
Edit: I'm adding here some more info about code, as suggested by Regis Portalez. Here is the code snippet causing the problem. VS debugger stops before last line indicating an access violation:
void Film::ComputeGroupScale(u_int i)
{
const Color white(space.ToXYZ(RGB(1.f)));
if (groups[i].temperature > 0.f) {
Color colorTemp(SPD(groups[i].temperature));
colorTemp /= colorTemp.Y();
groups[i].convert = Adapter(white,
space.ToXYZ(groups[i].rgbScale)) *
Adapter(white, colorTemp);
} else {
groups[i].convert = Adapter(white,
space.ToXYZ(groups[i].rgbScale));
}
groups[i].convert *= groups[i].globalScale;
}
Following is the assembly code for the last line:
}
groups[i].convert *= groups[i].globalScale;
00007FFB28E43B2A vbroadcastss ymm2,dword ptr [rax+rbx+40h]
00007FFB28E43B31 mov rax,qword ptr [rdi+1B8h]
00007FFB28E43B38 vmulps ymm0,ymm2,ymmword ptr [rax+rbx+54h]
00007FFB28E43B3E vmovups ymmword ptr [rax+rbx+54h],ymm0
00007FFB28E43B44 vmulss xmm0,xmm2,dword ptr [rax+rbx+74h]
00007FFB28E43B4A vmovss dword ptr [rax+rbx+74h],xmm0
00007FFB28E43B50 vzeroupper
}
The debugger indicates that the access violation happens at vbroadcastss. Register contents are as follows and show that there is an attempt to read location 0:
RAX = 000000003F800000 RBX = 0000000000000000 RCX = 000007FEDC1143A8 RDX = 000000000021B6E8
RSI = 0000000007709710 RDI = 0000000002D165E0 R8 = 000000000021B6B0 R9 = 0000000004D25190
R10 = 00000000003B0274 R11 = 000000000021B428 R12 = 0000000000000001 R13 = 0000000000000000
R14 = 0000000000000000 R15 = 0000000002288480 RIP = 000007FEDBDC2C0A RSP = 000000000021B690
RBP = 000000000021B790 EFL = 00010340
0x000000003f800040 = 00000000
As comparison, when /arch:AVX2 is not used, this is the assembly:
groups[i].convert *= groups[i].globalScale;
000007FEDCA019E8 movups xmm0,xmmword ptr [rax+rbx+54h]
000007FEDCA019ED shufps xmm2,xmm2,0
000007FEDCA019F1 mulps xmm0,xmm2
000007FEDCA019F4 movups xmmword ptr [rax+rbx+54h],xmm0
000007FEDCA019F9 movups xmm0,xmmword ptr [rax+rbx+64h]
000007FEDCA019FE mulps xmm0,xmm2
000007FEDCA01A01 movups xmmword ptr [rax+rbx+64h],xmm0
000007FEDCA01A06 mulss xmm2,dword ptr [rax+rbx+74h]
000007FEDCA01A0C movss dword ptr [rax+rbx+74h],xmm2
}
The groups object involved in the access violation is defined as:
std::vector<Group> groups;
class Group {
public:
Group(const string &n) : samples(0.f), name(n),
globalScale(1.f), temperature(0.f),
rgbScale(1.f), convert(Color(1.f), Color(1.f)),
enable(true) { }
~Group() {
for(vector<Buffer *>::iterator buffer = buffers.begin(); buffer != buffers.end(); ++buffer)
delete *buffer;
}
void CreateBuffers(const vector<BufferConfig> &configs, u_int x, u_int y);
Buffer *getBuffer(u_int index) const {
return buffers[index];
}
double samples;
vector<Buffer *> buffers;
string name;
float globalScale, temperature;
RGB rgbScale;
Adapter convert;
bool enable;
};
I hope this info can allow some more analysis...