Extracting LLVM bitcode embedded using `-lto-embed-bitcode`

1.1k views Asked by At

Goal: Extract full-program (merged) post-LTO bitcode from an ELF binary.

The program happens to be written in Rust, but I don't think that's a crucial detail.

I'm able to compile a Rust program into an ELF binary with a .llvmbc section using the following Cargo invocation:

RUSTFLAGS="-C linker_plugin_lto -C linker=clang -C link-arg=-fuse-ld=lld -C link-arg=-Wl,--plugin-opt=-lto-embed-bitcode=optimized" \
    cargo build --release

I'm then able to use readelf -S | grep llvmbc to verify that the section exists. It does. Superb!

I'd now like to extract the full-program post-LTO bitcode and disassemble it:

$ objcopy target/release/world --dump-section .llvmbc=llvm.bc
$ llvm-dis llvm.bc
LLVM ERROR: Invalid encoding
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0.      Program arguments: llvm-dis llvm.bc
 #0 0x000055c06f7ae78c llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/vext01/research/yorick/llvm-project/inst/bin/llvm-dis+0x1b578c)
 #1 0x000055c06f7ac6e4 llvm::sys::RunSignalHandlers() (/home/vext01/research/yorick/llvm-project/inst/bin/llvm-dis+0x1b36e4)
 #2 0x000055c06f7ac843 SignalHandler(int) (/home/vext01/research/yorick/llvm-project/inst/bin/llvm-dis+0x1b3843)
 #3 0x00007f6c62eaf730 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12730)
 #4 0x00007f6c627957bb raise /build/glibc-vjB4T1/glibc-2.28/signal/../sysdeps/unix/sysv/linux/raise.c:51:1
 #5 0x00007f6c62780535 abort /build/glibc-vjB4T1/glibc-2.28/stdlib/abort.c:81:7
 #6 0x000055c06f783753 llvm::report_fatal_error(llvm::Twine const&, bool) (/home/vext01/research/yorick/llvm-project/inst/bin/llvm-dis+0x18a753)
 #7 0x000055c06f783868 (/home/vext01/research/yorick/llvm-project/inst/bin/llvm-dis+0x18a868)
 #8 0x000055c06f7bb703 llvm::BitstreamCursor::ReadAbbrevRecord() (/home/vext01/research/yorick/llvm-project/inst/bin/llvm-dis+0x1c2703)
 #9 0x000055c06f64a49d llvm::BitstreamCursor::advance(unsigned int) (.constprop.1679) (/home/vext01/research/yorick/llvm-project/inst/bin/llvm-dis+0x5149d)
#10 0x000055c06f658abd llvm::getBitcodeFileContents(llvm::MemoryBufferRef) (/home/vext01/research/yorick/llvm-project/inst/bin/llvm-dis+0x5fabd)
#11 0x000055c06f63e159 main (/home/vext01/research/yorick/llvm-project/inst/bin/llvm-dis+0x45159)
#12 0x00007f6c6278209b __libc_start_main /build/glibc-vjB4T1/glibc-2.28/csu/../csu/libc-start.c:342:3
#13 0x000055c06f6436ea _start (/home/vext01/research/yorick/llvm-project/inst/bin/llvm-dis+0x4a6ea)
Aborted

If I search the binary for the LLVM header's magic bytes, 0x4243c0de, there are multiple hits. Furthermore, if I tell rustc to use a single codegen unit (-C codegen-units=1), then there are then fewer hits for the magic bytes (exactly two).

What I think is happening is the linker is concatenating the .llvmbc sections of the intermediate objects with the post-LTO bitcode, and this confuses llvm-dis.

Assuming that's the case, how can I unambiguously extract only the post-LTO bitcode? I don't feel comfortable with trying to separate the different modules based on the magic bytes. This seems error prone, as that byte sequence could appear elsewhere by coincidence (i.e. not marking the start of a bitcode object at all).

Is there perhaps a way to make libLTO put the post-LTO bitcode into a dedicated section of a different name? Having read the source code, I don't think it's possible without modifications.

Thanks

EDIT

Repeating the experiment using clang instead of rustc actually seems to work, so I'm starting to wonder if this is in fact a rust bug. Perhaps rustc is passing the old pre-merged bitcode through when it shouldn't?

$ clang  -fuse-ld=lld -flto -Wl,--plugin-opt=-lto-embed-bitcode=optimized world.c
$ objcopy a.out --dump-section .llvmbc=llvm.bc
$ llvm-dis llvm.bc
$ head -5 llvm.ll
; ModuleID = 'llvm.bc'
source_filename = "ld-temp.o"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

0

There are 0 answers