I'm trying to work with AVX instructions and windows 64bit. I'm comfortable with g++ compiler so I've been using that, however, there is a big bug described reported here and very rough solutions were presented here.
Basically, m256 variable can't be aligned on the stack to work properly with avx instructions, it needs 32 byte alignment.
The solutions presented at the other stack question I linked are really terrible, especially if you have performance in mind. A python program that you would have to run every time you want to debug that replaces instructions with their sub-optimal unaligned instructions, or over-allocating and doing a bunch of costly hacky pointer math in code to get proper alignment. If you do the pointer math solution, I think there is still even a chance for a seg fault because you can't control the allocation or r-values / temporaries.
I'm looking for an easier and cheaper solution. I don't mind switching compilers, would prefer not to, but if it's the best solution I will. However, my very poor understanding of the bug is that it is intrinsic to windows 64 bit, so would switching compilers help or do other compilers also have the same issue?
You can solve this problem by switching to Microsoft's 64-bit C/C++ compiler. The problem is not intrinsic to 64-bit Windows. Despite what Kai Tietz said in the bug report you linked, Microsoft's x64 ABI does allow a compiler to give variables a greater than 16-byte alignment on the stack.
Also Cygwin's 64-bit version of GCC 4.9.2 can give variables 32-byte alignment on the stack.
Clang for Windows also makes working executables with AVX, and is a good choice in terms of optimizing well.