Where to patch back the information gathered during program analysis

74 views Asked by At

I'm new to compiler design and have few years with java.

Using this and the paper It's look like after Class hierarchy analysis and rapid type analysis will get information to do de-virtualisation. But where to patch back the information on source code or on Byte-code. And how to check the results?

Trying to understand how things really happens but stuck here. For example : We have an example program taken from paper specified above.

public class MyProgram {
  public static void main(String[] args) {
     EUCitizen citizen = getCitizen();
     citizen.hasRightToVote();               // Call site 1
     Estonian estonian = getEstonian();
     estonian.hasRightToVote();              // Call site 2
   }

 private static EUCitizen getCitizen() {
       return new Estonian();
  }

 private static Estonian getEstonian() {
       return new Estonian();
  }
 }

Using Class hieracrchy method we can conclude as none of the subclasses override hasRightToVote() , the dynamic method invocation can be replaced with a static procedure call to Estonian#hasRightToVote() . But where to replace this information and How? How to tell JVM (feed JVM) that information that we have gathered during analysis.

You can't change source code and put this there ? Could anyone provide me an example so i can start trying new ways to do analysis and still be able to patch that information. Thanks.

3

There are 3 answers

1
user2754673 On

I had some doubts with the same and Rohan Padhey Cleared the ones.

In Java, I don't think there is a way to specify monomophrism of virtual method calls in byte-code. The de-virtualization analysis usually happens in the JIT compiler which compiles bytecode to native code and it does so using dynamic analysis.

Why Patching is a Problem :

In Java bytecode, the only method call instructions are: invokestatic, invokedynamic, invokevirtual, invokeinterface and invokespecial (the last is used for constructors, etc). The only type of call that does not refer to virtual method table lookups is the invokestatic call, since static methods cannot be overridden and used polymorphically on objects.

Hence, while there is no way to do a compile-time specification of the target method, you can replace virtual calls with static calls. How? consider an object "x" with a method "foo", and a call-site:

x.foo(arg1, arg2, ...)

If you know for sure that "x" is of the class "A", then you can transform this to:

A.static_foo(x, arg1, arg2, ...)

where "static_foo" is a newly created static method in class A whose body contains exactly everything that the body of "foo()" in "A" would have done, except that references to "this" inside the body should now be replaced by the first parameter, whatever you may call it.

That is exactly what the Whole-Jimple-Optimization-Pack (WJOP) in Soot does.

As regards static analysis using Soot, there is an optimization pack that does devirtualization using a work-around: https://github.com/Sable/soot/wiki/Whole-program-Devirtualization-Optimizations But That's just a hack.

Why JIT Times Its Better :

JIT doing this better is due to the fact that static analysis has to be sound because you need to be sure when doing this transformation that 100% of the time the target of the virtual call will be one class. With JIT compilation, you can find more opportunities for optimization because even if the target is a single class 90% of the time, but not 10%, you can just-in-time compile the code to use the most-frequently taken route, and fall-back to using bytecode in the 10% of the cases where this prediction was wrong, because you can check this mistake dynamically. While the fall-back is expensive, the common-case of correct predictions 90% of the time leads to overall benefit. With static transformation, you have to make a decision of whether or not to optimize and it better be sound.

0
Ira Baxter On

What generally happens is that analysis results are typically stored as some kind of association with a program representation, or are used immediately to effect the optimization so "nothing" needs to be stored.

You are right: there is generally no "good" way to annotate the source code with an analysis result (you can use Java annotations as a way). But the compiler has already read the source code and isn't going read it again.

In general, the program is parsed and variety of compiler-like structures are built (ASTs, symbol tables, control flow graphs, data flow arcs, ...) by the compiler pretty much before any serious analysis/optimization begins. A low level model of the program (data flow over the operators) is normally what gets analyzed, and the optimization analyzer will either decorate this structure with its opinions, or often just directly modify this structure to achieve the effect of the optimization.

With Java, there are two opportunities to do this: in JavaC, and in the JITter. My understanding (probably wrong, probably varies across JavaC implementations) is that not much optimization occurs in JavaC at all; it just generates naive JVM bytecode, and that all the real work is done in the JITter. The JITter doesn't have source code, but it can do all the same kinds of analysis (control flow, dataflow, ...) on the byte code that one can do on classic compiler structures, and thus achieve the same effect.

3
the8472 On

Class Hierarchy Analysis is an optimization done by the virtual machine itself at runtime, you do not have to tell the VM anything. It simply does the analysis by itself based on the information available in the class files.