Efficient way to parse DWARF

2.5k views Asked by At

I try to build a debugger which allows me to set breakpoints at functions or codelines. The needed debug information should be extracted from the DWARF section from an elf file. I am able to extract these data. The project I want to debug has 50-100 files, so I need about 10 min to parse the elf with readelf or pyelftools for all the dwarf infos I need. To increase speed, my next approach was to only parse for the debug infos of the currently opend source file. But it also takes a few minutes using pyelftools.

How do debuggers get the informations so fast? I use an iSystem debugger with winIDEA and it takes about 20sec. to flash the elf and afterwards I am instantly able to set breakpoints in any source file.

I am new to the topic so any help is appreciated.

EDIT: This is how I use pyelftools to get function addresses from one file

def main():
  dwarfinfo = elffile.get_dwarf_info()

  for CU in dwarfinfo.iter_CUs():
    top_DIE = CU.get_top_DIE()

    if FILENAME in top_DIE.get_full_path():
      die_info_rec(top_DIE)
      return

def die_info_rec(die):
  if "subprogram" in die.tag:
    # Function found, get data
    return
1

There are 1 answers

0
Employed Russian On BEST ANSWER

How do debuggers get the informations so fast?

By reading only the info they need (DWARF format is structured such that you can efficiently skip over translation units and functions you are not interested in), and by doing it in C.

I need about 10 min to parse the elf with readelf or pyelftools

That is likely significant part of your problem: parsing readelf output is probably 100 to 1000 times less efficient than reading the info directly.

pyelftools does appear to provide an API to iterate over compilation units, and in theory should be able to provide efficient access.

You didn't show how you are using it, you may not be doing that efficiently.

Even then, pyelftools is implemented in pure Python, so likely is at least 10 times slower than something like libdwarf.