Good algorithm for generating call graphs?

1.2k views Asked by At

I am writing some code to generate call graphs for a particular intermediate representation without executing it by statically scanning the IR code. The IR code itself is not too complex and I have a good understanding of what function call sequences look like so all I need to do is trace the calls. I am currently doing it the obvious way:

  • Keep track of where we are
  • If we encounter a function call, branch to that location, execute and come back
  • While branching put an edge between the caller and the callee

I am satisfied with where I am getting at but I want to make sure that I am not reinventing the wheel here and face corner cases. I am wondering if there are any accepted good algorithms (and/or design patterns) that do this efficiently?

UPDATE: The IR code is a byte-code disassembly from a homebrewn Java-like language and looks like the Jasmine specification.

2

There are 2 answers

6
phooji On BEST ANSWER

From an academic perspective, here are some considerations:

  • Do you care about being conservative / correct? For example, suppose the code you're analyzing contains a call through a function pointer. If you're just generating documentation, then it's not necessary to deal with this. If you're doing a code optimization that might go wrong, you will need to assume that 'call through pointer' means 'could be anything.'

  • Beware of exceptional execution paths. Your IR may or may not abstract this away from you, but keep in mind that many operations can throw both language-level exceptions as well as hardware interrupts. Again, it depends on what you want to do with the call graph later.

  • Consider how you'll deal with cycles (e.g. recursion, mutual recursion). This may affect how you write code for traversing the graphs later on (i.e., they will need some sort of 'visited' set to avoid traversing cycles forever).

Cheers.

Update March 6:

Based on extra information added to the original post:

  • Be careful about virtual method invocations. Keep in mind that, in general, it is unknowable which method will execute. You may have to assume that the call will go to any of the subclasses of a particular class. The standard example goes a bit like this: suppose you have an ArrayList<A>, and you have class B extends A. Based on a random number generator, you will add instances of A and B to the list. Now you call x.foo() for all x in the list, where foo() is a virtual method in A with an override in B. So, by just looking at the source code, there is no way of knowing whether the loop calls A.foo, B.foo, or both at run time.
1
Russ On

I don't know the algorithm, but pycallgraph does a decent job. It is worth checking out the source for it. It is not long and should be good for checking out existing design patterns.