Understanding assembly of a simple C program

1.2k views Asked by At

I am trying to understand the assembly of this simple C program.

#include<stdio.h>
#include<unistd.h>
#include<fcntl.h>
#include<string.h>
void foobar(char *a){
    char c = a[0];
}
int main(){
    int fd = open("file.txt", O_RDONLY);
        char buf1[100]="\0";
    char buf[100];
    int aa=0,b=1,c=2,d=3,f=2,g=3;
    read(fd,buf1,104);
    if(strlen(buf1) > 100){

    }else{
        strcpy(buf,buf1);
    }
    //strcpy(buf,buf1);
    foobar(buf1);
}

The disassembly of the executable using gdb which i got was foobar disassembly.

   0x000000000040067d <+0>: push   rbp
   0x000000000040067e <+1>: mov    rbp,rsp
   0x0000000000400681 <+4>: mov    QWORD PTR [rbp-0x18],rdi
   0x0000000000400685 <+8>: mov    rax,QWORD PTR [rbp-0x18]
   0x0000000000400689 <+12>:    movzx  eax,BYTE PTR [rax]
   0x000000000040068c <+15>:    mov    BYTE PTR [rbp-0x1],al
   0x000000000040068f <+18>:    pop    rbp

main disassembly just before foobar

   0x0000000000400784 <+243>:   lea    rax,[rbp-0xf0]
   0x000000000040078b <+250>:   mov    rdi,rax
   0x000000000040078e <+253>:   call   0x40067d <foobar>
   0x0000000000400793 <+258>:   mov    rbx,QWORD PTR [rbp-0x18]
   0x0000000000400797 <+262>:   xor    rbx,QWORD PTR fs:0x28
   0x00000000004007a0 <+271>:   je     0x4007a7 <main+278>
   0x0000000000400690 <+19>:    ret   

Now, i have a question regarding the disassembly of foobar

0x0000000000400681 <+4>:    mov    QWORD PTR [rbp-0x18],rdi
0x0000000000400685 <+8>:    mov    rax,QWORD PTR [rbp-0x18]

Wouldn't the instruction

mov rax, rdi

would do the work required by the above two instruction. Why using extra memory location rbp - 0x18 for rdi ? Is it related to pass by reference?

Edit: Another question which i want to ask is why the foobar function is accessing something(rbp - 0x18) which is not in the frame of foobar.?

My gcc version is gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2

Edit: After using -O1 -O2 and -O3 optimization flag while compiling, the foobar assembly changes to

   0x0000000000400670 <+0>: repz ret 

and while using -O3 flag some of the disassembly of main is

   0x0000000000400551 <+81>:    rep stos QWORD PTR es:[rdi],rax
   0x0000000000400554 <+84>:    mov    DWORD PTR [rdi],0x0
   0x000000000040055a <+90>:    mov    cl,0x64
   0x000000000040055c <+92>:    mov    edi,r8d
   0x000000000040055f <+95>:    call   0x4004b0 <__read_chk@plt>
   0x0000000000400564 <+100>:   mov    rdx,QWORD PTR [rsp+0x68]
   0x0000000000400569 <+105>:   xor    rdx,QWORD PTR fs:0x28
   0x0000000000400572 <+114>:   jne    0x400579 <main+121>
   0x0000000000400574 <+116>:   add    rsp,0x78
   0x0000000000400578 <+120>:   ret    
   0x0000000000400579 <+121>:   call   0x4004c0 <__stack_chk_fail@plt>

I can't find any call to foobar in main .

2

There are 2 answers

0
Basile Starynkevitch On

As several people commented, you should compile with some optimizations, e.g. at least with gcc -O1 (and preferably gcc -O2).

If compiling with GCC specifically, I suggest to pass also -fverbose-asm since this emit helpful generated comments in the produced assembler file.

Here is the relevant listing, compiled using GCC 5.1 on Debian/Sid/amd64, using gcc-5 -O2 -fverbose-asm -S go.c then look into the produced go.s assembler file with a pager :

        .section        .text.unlikely,"ax",@progbits
.LCOLDB0:
        .text
.LHOTB0:
        .p2align 4,,15
        .globl  foobar
        .type   foobar, @function
foobar:
.LFB25:
        .cfi_startproc
        rep ret
        .cfi_endproc
.LFE25:
        .size   foobar, .-foobar
        .section        .text.unlikely
.LCOLDE0:
        .text
.LHOTE0:
        .section        .rodata.str1.1,"aMS",@progbits,1
.LC1:
        .string "file.txt"
        .section        .rodata
.LC2:
        .string ""
        .string ""
        .zero   98
        .section        .text.unlikely
.LCOLDB3:
        .section        .text.startup,"ax",@progbits
.LHOTB3:
        .p2align 4,,15
        .globl  main
        .type   main, @function
main:
.LFB26:
        .cfi_startproc
        subq    $120, %rsp      #,
        .cfi_def_cfa_offset 128
        xorl    %esi, %esi      #
        movl    $.LC1, %edi     #,
        xorl    %eax, %eax      #
        call    open    #
        movl    %eax, %r8d      #, fd
        movzwl  .LC2(%rip), %eax        #, tmp92
        leaq    8(%rsp), %rdi   #, tmp93
        movl    $11, %ecx       #, tmp95
        movl    $104, %edx      #,
        movq    %rsp, %rsi      #,
        movl    $0, 4(%rsp)     #, buf1
        movw    %ax, (%rsp)     # tmp92, buf1
        xorl    %eax, %eax      #
        movw    %ax, 2(%rsp)    #, buf1
        xorl    %eax, %eax      # tmp94
        rep stosq
        movl    $0, (%rdi)      #, buf1
        movl    %r8d, %edi      # fd,
        call    read    #
        movq    %rsp, %rax      #, D.3346
.L3:
        movl    (%rax), %edx    #* D.3346, tmp100
        addq    $4, %rax        #, D.3346
        leal    -16843009(%rdx), %ecx   #, tmp99
        notl    %edx    # tmp100
        andl    %edx, %ecx      # tmp100, tmp99
        andl    $-2139062144, %ecx      #, tmp99
        je      .L3     #,
        xorl    %eax, %eax      #
        addq    $120, %rsp      #,
        .cfi_def_cfa_offset 8
        ret
        .cfi_endproc
.LFE26:
        .size   main, .-main

The compiler inlined to call to foobar and optimized its body to empty (since your source code has no observable side-effect for foobar). Then it removed any call to foobar since it is useless.

You might try to compile with -fdump-tree-all. You'll get hundreds of dump files, corresponding to the many GCC optimization passes producing them.

You could also customize your gcc with MELT (a Lisp-like domain specific language to extend GCC), and you could even search some Gimple or Tree patterns using MELT's findgimple mode (a sort-of grep on the Gimple representations internal to GCC).

2
John M On

This is a good question. I commend you for "peeking under the hood", so to speak.

Tons of research has gone into compiling code. Sometimes you want code to run fast, sometimes you want it to be small, and sometimes you want it to compile quickly. Thanks to compilers research, a compiler can generate code that behaves in any of these mentioned ways. To allow users to pick which one of these options they want, gcc has command line options that control the level of optimization.

By default, gcc uses -O0, which does not optimize code much, but instead focuses on the fastest compile time. Because of this, you will sometimes find inefficient instruction sequences.


When you turn on the -O3 flag, the compiler inlines the code for foobar. As you know, function calls take time, so, if the function foobar is short enough, the compiler will just copy the whole code for foobar instead of calling it, thereby eliminating the need for the call and ret instructions. This makes the code a tiiiiiny bit faster, but it also makes it bigger.

Consider a 100-instruction function that is called 100 times. If this function is inlined, the code size will increase drastically, for not much extra speed. The compiler only inlines code if you have a high optimization level set and the function in question is quite small.

You have probably noticed that there is nothing in place of the foobar function. It has been "optimized out", meaning that the compiler completely deleted it. This is because the compiler can tell that foobar doesn't do anything useful. That is, it has no side effects. At -O0, nothing is optimized out. At higher optimization levels, gcc starts to optimize out functions with no side effects to save space.

I haven't written x86 assmembly in a few years (just arm nowadays), but if I recall correctly, repz ret is practically a more efficient form of ret due to branch prediciton. more info can be found here.

I have to go to sleep now. If you still have questions, I will respond later :).