Can the global offset table manually be defined?

681 views Asked by At

I'm attempting to build a flat 32-bit PIC binary with the following C++ code:

extern "C" {
void print(const char *){}

void entry_func() {
  print("abcd\n");
}
}

The assembly produced for the print("abcd\n") bit is:

        calll   .L1$pb
.L1$pb:
        popl    %ebx
.Ltmp3:
        addl    $_GLOBAL_OFFSET_TABLE_+(.Ltmp3-.L1$pb), %ebx
        leal    .L.str@GOTOFF(%ebx), %eax
        movl    %eax, (%esp)
        calll   print@PLT

If I use GNU ld to link a flat binary using this linker script:

SECTIONS {
  . = 16M;
  .text : ALIGN(4K) {
    *(.text)
  }
}

I get the following link error:

undefined reference to `_GLOBAL_OFFSET_TABLE_'

First Issue

Given the assembly I showed earlier, should I still expect the linker to produce a GOT even for a flat binary?

In the corresponding object file, I see these two relocations:

 Offset     Info    Type                Sym. Value  Symbol's Name
0000001d  00000a0a R_386_GOTPC            00000000   _GLOBAL_OFFSET_TABLE_
00000023  00000309 R_386_GOTOFF           00000000   .L.str

Now according to this documentation I found, I would think the linker should emit a GOT:

R_386_GOTOFF

Computes the difference between a symbol's value and the address of the global offset table. It also instructs the link-editor to create the global offset table.

R_386_GOTPC

Resembles R_386_PC32, except that it uses the address of the global offset table in its calculation. The symbol referenced in this relocation normally is GLOBAL_OFFSET_TABLE, which also instructs the link-editor to create the global offset table.

Is this an issue with ld, or perhaps should ld not actually need to emit a GOT because I'm producing a flat binary rather than an ELF binary?

Second Issue

Now I can patch this error by also compiling and linking in a .S file that actually defines this symbol:

  .globl _GLOBAL_OFFSET_TABLE_
  .section .got,"wa",@progbits
_GLOBAL_OFFSET_TABLE_:
  .word 0xabcd  // Filler data so it's easier to find in the objdump

This links successfully, but my binary seems to be incorrect when I objdump it:

00000000 <.data>:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   8b 45 08                mov    0x8(%ebp),%eax
   6:   5d                      pop    %ebp
   7:   c3                      ret    
   8:   90                      nop
...
   f:   90                      nop
  10:   55                      push   %ebp
  11:   89 e5                   mov    %esp,%ebp
  13:   53                      push   %ebx
  14:   50                      push   %eax
  15:   e8 00 00 00 00          call   0x1a
  1a:   5b                      pop    %ebx
  1b:   81 c3 1b 00 00 00       add    $0x1b,%ebx
  21:   8d 83 38 00 00 01       lea    0x1000038(%ebx),%eax
  27:   89 04 24                mov    %eax,(%esp)
  2a:   e8 d1 ff ff ff          call   0x0
  2f:   83 c4 04                add    $0x4,%esp
  32:   5b                      pop    %ebx
  33:   5d                      pop    %ebp
  34:   c3                      ret    
  35:   cd ab                   int    $0xab
  37:   00 61 62                add    %ah,0x62(%ecx)
  3a:   63 64 0a 00             arpl   %sp,0x0(%edx,%ecx,1)

The value for $_GLOBAL_OFFSET_TABLE_+(.Ltmp3-.L1$pb) seems to have expanded correctly: _GLOBAL_OFFSET_TABLE_ has the relocation R_386_GOTPC and is calculated as the offset between the GOT (0x35) and the current PC (0x1b), and (.Ltmp3-.L1$pb) is just 1 byte (so 0x35-0x1b+0x1 = 0x1b).

My second issue is that the value for .L.str@GOTOFF seems to assume the GOT is at address zero. It's corresponding relocation is R_386_GOTOFF which is calculated as the offset between the symbol (.L.str) and the GOT. Now if I had my binary start at 16MB (from the linker script), and the offset for .L.str into the binary is at 0x38, then the location for the symbol is 0x1000038. If so, and the result is 0x1000038 then this implies the GOT is at zero.

My second question: is there a way to manually tell the linker where the GOT is? I'm guessing my _GLOBAL_OFFSET_TABLE_ trick didn't work here because _GLOBAL_OFFSET_TABLE_ probably acts more as a symbol that's emitted to indicate where the GOT actually is rather than the other way around (the linker looking up wherever _GLOBAL_OFFSET_TABLE_ is and placing the GOT there).

My overall goal is to see if I can write a flat PIC binary in pure C/C++ (to a certain extent). I know at least for this small code example that I could circumvent the GOT in pure assembly with something like:

  call .L$pb
.L$pb:
  pop %ebx
  addl $(.L.str - .L$pb), %ebx
  movl %ebx, (%esp)
  calll print@PLT

Here rather than adding offsets between the PC and GOT, and GOT and .L.str, I just take the offset between the PC and .L.str. This emits a R_386_PC32 for $(.L.str - .L$pb) which can be resolved statically. The result is still PIC, but without the GOT. In a similar way to how the linker can relax PLT relocations to relative calls if the call and function definition are in the same binary, I wonder if there's a way to relax these two GOT relocations to just take the relative reference to my binary-local data correctly.

0

There are 0 answers