Intel C++ optimizer removes masm code

171 views Asked by At

I recently started using the Intel C++ compiler for some of my projects, while also learning masm assembly. I kept on hearing how it wasn't worth learning assembly since the compilers do a good job anyway of optimizing code, and so thought about having a look at which one was faster once and for all. To try and do so, I had the following c++ code:

#include <iostream>
#include <time.h>

using namespace std;

extern "C" {
int Add(int a, int b);
}


int main(int argc, char * argv[]){
        int startingTime = clock();
        for (int i = 0; i < 100; i++)
        {
            cout << "normal: " << i << endl;
            cout << 1000 + 1000 << endl;
        }
        int timeTaken1 = clock() - startingTime;

        startingTime = clock();
        for (int i = 0; i < 100; i++){
             cout << "assem" << i << endl;
             cout << Add(2000, 2000) << endl;
        }
        int timeTaken2 = clock() - startingTime;

        cout << "Time taken under normal addition: " << timeTaken1 << endl;
        cout << "Time taken under assembly addition: " << timeTaken2 << endl;

        cin.get();
        return 0;
   }

And the following masm code:

.model flat
.386

.code

    public _Add

_Add PROC
        push ebp            ;
        mov ebp, esp        ;
        mov eax, [ebp + 8]  ;
        mov ebx, [ebp + 12] ;
        add eax, ebx        ;
        leave               ; cleanup
        ret                 ;


_Add endp
end

I am using Visual Studio to compile this, using the Intel Composer plugin. When I run this under Debug mode, it works perfectly - I can see "normal 99" and "assem 99" along with the relevant number. When I run this with /0d specified for the compiler, then it also works fine. However, when /02, /0x or /03 are specified, it only shows the normal (i+j) addition loop and the first value of the assembler addition i.e. only assem 0 and 4000 are shown.

My guess is that the assembly code is being optimized out by the Intel Compiler (this works fine with the VC++ compiler), and am curious to find out why this is occurring and how it can be worked around, while still letting Intel optimize the C++ part.

Thanks SbSpider

EDIT: I know this is a late, but thanks for all of the replies. It seems that it was an error in the assembly code rather than the intel compiler not using the assembly code.

1

There are 1 answers

1
Ross Ridge On BEST ANSWER

Your assembly code is trashing the EBX register (as Jongware noted) and this likely why the second loop in your C++ code is only executed once. If i being stored in EBX then changing EBX to 2000 in Add will cause the next test of the loop condition i < 100 to fail.

You need either save and restore the EBX register in your assembly code or you need to pick another register that isn't assumed to be preserved across function calls (EAX, EDX, or ECX).