MSVC 2013: how to properly enable AVX2?

834 views Asked by At

I am trying to obtain a faster binary for a specific software by turning on /arch:AVX2, since I am running it on CPUs that should support that instruction set (i7 4770 and i7 4800MQ).
However, doing so produces an executable that crashes with the message "xxx.exe has stopped working", as if I was running it on non-AVX2 hardware.
I know that AVX2 is correctly supported on my system by running y-cruncher, which detects AVX2 hardware and runs the corresponding executable.
The same problem occurs if I specify /arch:AVX.
The code runs normally when built without /arch option. The other used build options are:

/Ox /Ob2 /Oi /Oy /GT /GL /Gm- /EHsc /MD /GS /fp:precise /Zc:wchar_t /Zc:forScope /GR /openmp /Gd /TP /GL /GF /Ot  /Qfast_transcendentals

The software in itself does not make use of specific AVX2 intrinsics, as it is designed to run on a broader set of platforms. I am just trying to obtain better performance on my platforms, without changing the code (which I would not even be able to do, as this is a complex piece of software and I am no expert programmer at all).

My question is: why should that option make the program crash on an AVX2-enabled machine? Am I missing something that prevents /arch:AVX2 from working correctly, like e.g. incompatibilities with other flags? (I checked MS docs and did not find any "cross dependency" related to AVX2).

Edit: I'm adding here some more info about code, as suggested by Regis Portalez. Here is the code snippet causing the problem. VS debugger stops before last line indicating an access violation:

void Film::ComputeGroupScale(u_int i)
{
    const Color white(space.ToXYZ(RGB(1.f)));
    if (groups[i].temperature > 0.f) {
        Color colorTemp(SPD(groups[i].temperature));
        colorTemp /= colorTemp.Y();
        groups[i].convert = Adapter(white,
            space.ToXYZ(groups[i].rgbScale)) *
            Adapter(white, colorTemp);
    } else {
        groups[i].convert = Adapter(white,
            space.ToXYZ(groups[i].rgbScale));
    }
    groups[i].convert *= groups[i].globalScale;
}

Following is the assembly code for the last line:

}
    groups[i].convert *= groups[i].globalScale;
00007FFB28E43B2A  vbroadcastss ymm2,dword ptr [rax+rbx+40h]  
00007FFB28E43B31  mov         rax,qword ptr [rdi+1B8h]  
00007FFB28E43B38  vmulps      ymm0,ymm2,ymmword ptr [rax+rbx+54h]  
00007FFB28E43B3E  vmovups     ymmword ptr [rax+rbx+54h],ymm0  
00007FFB28E43B44  vmulss      xmm0,xmm2,dword ptr [rax+rbx+74h]  
00007FFB28E43B4A  vmovss      dword ptr [rax+rbx+74h],xmm0  
00007FFB28E43B50  vzeroupper  
}

The debugger indicates that the access violation happens at vbroadcastss. Register contents are as follows and show that there is an attempt to read location 0:

RAX = 000000003F800000 RBX = 0000000000000000 RCX = 000007FEDC1143A8 RDX = 000000000021B6E8 
RSI = 0000000007709710 RDI = 0000000002D165E0 R8  = 000000000021B6B0 R9  = 0000000004D25190 
R10 = 00000000003B0274 R11 = 000000000021B428 R12 = 0000000000000001 R13 = 0000000000000000 
R14 = 0000000000000000 R15 = 0000000002288480 RIP = 000007FEDBDC2C0A RSP = 000000000021B690 
RBP = 000000000021B790 EFL = 00010340 

0x000000003f800040 = 00000000 

As comparison, when /arch:AVX2 is not used, this is the assembly:

    groups[i].convert *= groups[i].globalScale;
000007FEDCA019E8  movups      xmm0,xmmword ptr [rax+rbx+54h]  
000007FEDCA019ED  shufps      xmm2,xmm2,0  
000007FEDCA019F1  mulps       xmm0,xmm2  
000007FEDCA019F4  movups      xmmword ptr [rax+rbx+54h],xmm0  
000007FEDCA019F9  movups      xmm0,xmmword ptr [rax+rbx+64h]  
000007FEDCA019FE  mulps       xmm0,xmm2  
000007FEDCA01A01  movups      xmmword ptr [rax+rbx+64h],xmm0  
000007FEDCA01A06  mulss       xmm2,dword ptr [rax+rbx+74h]  
000007FEDCA01A0C  movss       dword ptr [rax+rbx+74h],xmm2  
}

The groups object involved in the access violation is defined as:

std::vector<Group> groups;

class Group {
public:
    Group(const string &n) : samples(0.f), name(n),
        globalScale(1.f), temperature(0.f),
        rgbScale(1.f), convert(Color(1.f), Color(1.f)),
        enable(true) { }
    ~Group() {
        for(vector<Buffer *>::iterator buffer = buffers.begin(); buffer != buffers.end(); ++buffer)
            delete *buffer;
    }

    void CreateBuffers(const vector<BufferConfig> &configs, u_int x, u_int y);

    Buffer *getBuffer(u_int index) const {
        return buffers[index];
    }
    double samples;
    vector<Buffer *> buffers;
    string name;
    float globalScale, temperature;
    RGB rgbScale;
    Adapter convert;
    bool enable;
};

I hope this info can allow some more analysis...

0

There are 0 answers