Goal: Extract full-program (merged) post-LTO bitcode from an ELF binary.
The program happens to be written in Rust, but I don't think that's a crucial detail.
I'm able to compile a Rust program into an ELF binary with a .llvmbc
section using the following Cargo invocation:
RUSTFLAGS="-C linker_plugin_lto -C linker=clang -C link-arg=-fuse-ld=lld -C link-arg=-Wl,--plugin-opt=-lto-embed-bitcode=optimized" \
cargo build --release
I'm then able to use readelf -S | grep llvmbc
to verify that the section exists. It does. Superb!
I'd now like to extract the full-program post-LTO bitcode and disassemble it:
$ objcopy target/release/world --dump-section .llvmbc=llvm.bc
$ llvm-dis llvm.bc
LLVM ERROR: Invalid encoding
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0. Program arguments: llvm-dis llvm.bc
#0 0x000055c06f7ae78c llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/vext01/research/yorick/llvm-project/inst/bin/llvm-dis+0x1b578c)
#1 0x000055c06f7ac6e4 llvm::sys::RunSignalHandlers() (/home/vext01/research/yorick/llvm-project/inst/bin/llvm-dis+0x1b36e4)
#2 0x000055c06f7ac843 SignalHandler(int) (/home/vext01/research/yorick/llvm-project/inst/bin/llvm-dis+0x1b3843)
#3 0x00007f6c62eaf730 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12730)
#4 0x00007f6c627957bb raise /build/glibc-vjB4T1/glibc-2.28/signal/../sysdeps/unix/sysv/linux/raise.c:51:1
#5 0x00007f6c62780535 abort /build/glibc-vjB4T1/glibc-2.28/stdlib/abort.c:81:7
#6 0x000055c06f783753 llvm::report_fatal_error(llvm::Twine const&, bool) (/home/vext01/research/yorick/llvm-project/inst/bin/llvm-dis+0x18a753)
#7 0x000055c06f783868 (/home/vext01/research/yorick/llvm-project/inst/bin/llvm-dis+0x18a868)
#8 0x000055c06f7bb703 llvm::BitstreamCursor::ReadAbbrevRecord() (/home/vext01/research/yorick/llvm-project/inst/bin/llvm-dis+0x1c2703)
#9 0x000055c06f64a49d llvm::BitstreamCursor::advance(unsigned int) (.constprop.1679) (/home/vext01/research/yorick/llvm-project/inst/bin/llvm-dis+0x5149d)
#10 0x000055c06f658abd llvm::getBitcodeFileContents(llvm::MemoryBufferRef) (/home/vext01/research/yorick/llvm-project/inst/bin/llvm-dis+0x5fabd)
#11 0x000055c06f63e159 main (/home/vext01/research/yorick/llvm-project/inst/bin/llvm-dis+0x45159)
#12 0x00007f6c6278209b __libc_start_main /build/glibc-vjB4T1/glibc-2.28/csu/../csu/libc-start.c:342:3
#13 0x000055c06f6436ea _start (/home/vext01/research/yorick/llvm-project/inst/bin/llvm-dis+0x4a6ea)
Aborted
If I search the binary for the LLVM header's magic bytes, 0x4243c0de
, there are multiple hits. Furthermore, if I tell rustc
to use a single codegen unit (-C codegen-units=1
), then there are then fewer hits for the magic bytes (exactly two).
What I think is happening is the linker is concatenating the .llvmbc
sections of the intermediate objects with the post-LTO bitcode, and this confuses llvm-dis
.
Assuming that's the case, how can I unambiguously extract only the post-LTO bitcode? I don't feel comfortable with trying to separate the different modules based on the magic bytes. This seems error prone, as that byte sequence could appear elsewhere by coincidence (i.e. not marking the start of a bitcode object at all).
Is there perhaps a way to make libLTO put the post-LTO bitcode into a dedicated section of a different name? Having read the source code, I don't think it's possible without modifications.
Thanks
EDIT
Repeating the experiment using clang instead of rustc actually seems to work, so I'm starting to wonder if this is in fact a rust bug. Perhaps rustc is passing the old pre-merged bitcode through when it shouldn't?
$ clang -fuse-ld=lld -flto -Wl,--plugin-opt=-lto-embed-bitcode=optimized world.c
$ objcopy a.out --dump-section .llvmbc=llvm.bc
$ llvm-dis llvm.bc
$ head -5 llvm.ll
; ModuleID = 'llvm.bc'
source_filename = "ld-temp.o"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"