LLVM code generation causes problems with pointer arithmetic

36 views Asked by At

I'm attempting to write my own Operating System using my own language (compiler built using LLVM), but my code causes a page fault. This is the LLVM IR code generated by my compiler:

define i32 @tarLookup(ptr %archive, ptr %filename, ptr %out) {
entry:
  %fz = alloca i32, align 4
  %ptr = alloca ptr, align 8
  %archive1 = alloca ptr, align 8
  store ptr %archive, ptr %archive1, align 8
  %filename2 = alloca ptr, align 8
  store ptr %filename, ptr %filename2, align 8
  %out3 = alloca ptr, align 8
  store ptr %out, ptr %out3, align 8
  %archive4 = load ptr, ptr %archive1, align 8
  store ptr %archive4, ptr %ptr, align 8
  br label %cond

cond:
  %ptr5 = load ptr, ptr %ptr, align 8
  %tmpint = ptrtoint ptr %ptr5 to i64
  %tmpextendr = sext i32 257 to i64
  %tmp = add nsw i64 %tmpint, %tmpextendr
  %calltmp = call i1 @memeq(i64 %tmp, ptr @0, i32 5)
  br i1 %calltmp, label %loop, label %exit

loop:                                             ; preds = %cond
  %ptr6 = load ptr, ptr %ptr, align 8
  %tmpint7 = ptrtoint ptr %ptr6 to i64
  %tmpextendr8 = sext i32 124 to i64
  %tmp9 = add nsw i64 %tmpint7, %tmpextendr8
  %calltmp10 = call i32 @oct2bin(i64 %tmp9, i32 11)
  store i32 %calltmp10, ptr %fz, align 4
  %filename11 = load ptr, ptr %filename2, align 8
  %calltmp12 = call i32 @strlen(ptr %filename11)
  %tmp13 = add nsw i32 %calltmp12, 30
  br label %cond15

cond15:                                           ; preds = %loop
  %ptr16 = load ptr, ptr %ptr, align 8
  %filename17 = load ptr, ptr %filename2, align 8
  %filename18 = load ptr, ptr %filename2, align 8
  %calltmp19 = call i32 @strlen(ptr %filename18)
  %calltmp20 = call i1 @memeq(ptr %ptr16, ptr %filename17, i32 %calltmp19)
  br i1 %calltmp20, label %if, label %exit14

if:                                               ; preds = %cond15
  %ptr21 = load ptr, ptr %ptr, align 8
  %tmpint22 = ptrtoint ptr %ptr21 to i64
  %tmpextendr23 = sext i32 512 to i64
  %tmp24 = add nsw i64 %tmpint22, %tmpextendr23
  %out25 = load ptr, ptr %out3, align 8
  store i64 %tmp24, ptr %out25, align 4
  %fz26 = load i32, ptr %fz, align 4
  ret i32 %fz26

exit14:                                           ; preds = %cond15
  %ptr27 = load ptr, ptr %ptr, align 8 ;<--- the expression
  %fz28 = load i32, ptr %fz, align 4
  %tmp29 = add nsw i32 %fz28, 511
  %tmp30 = sdiv i32 %tmp29, 512
  %tmp31 = add nsw i32 %tmp30, 1
  %tmp32 = mul nsw i32 %tmp31, 512
  %tmpint33 = ptrtoint ptr %ptr27 to i64
  %tmpextendr34 = sext i32 %tmp32 to i64
  %tmp35 = add nsw i64 %tmpint33, %tmpextendr34
  store i64 %tmp35, ptr %ptr, align 4
  br label %cond

exit:                                             ; preds = %cond
  ret i32 0
}

I'm sorry for the bogus names in the IR, I found out that LLVM doesn't require default names a lot later :)

Although the page fault is caused by the memeq function, I traced the problem to the ptr pointer that is being passed to memeq by the tarLookup function. The pointer somehow points to the code segment (above 0xFFFFFFFFFF...). I initially thought a pointer deference was being missed, so instead of accessing the pointer saved in %ptr, the arithmetic was being done on %ptr itself. But I think the problem ties to this instruction in the generated object code:

and    $0xfffffe00,%eax

For context, the actual expression which causes the problem (in C) is below. I thought that it was because of some weird expression optimization being done by LLVM, but even with all optimizations turned off, the same code is produced. I'm confident that the problem lies in the generated code for this expression because the memeq function works perfectly fine the first time the loops runs, but after the pointer arithmetic in this expression, in the next iteration, the pointer passes to memeq is completely off.

ptr += (((filesize + 511) / 512) + 1) * 512;

The same code in C compiled with clang has no such and instruction :")

The page from osdev: USTAR
The full generated assembly code: the pastebin link \

I don't have access to GDB using qemu since I haven't added debug information to the compiler yet

More details:
Target platform: x86_64-elf
Linker: x86_64-elf-ld (built from source)
Bootloader: BOOTBOOT
I'm trying to load the file from USTAR initrd. The pointers passes to tarLookup are correct, as I mentioned above, it works in the first iteration of the loop (please check the osdev source for the source code). I tested this using serial outputs by the kernel.

The C code for achieving this: compiler explorer link

Please help me in finding where this problem originates from. I've spent all day on this problem to no avail!

0

There are 0 answers