Is it possible to decompile Java bytecode back to original generic type parameters

Question

Is it possible to decompile Java bytecode back to original generic type parameters

6.7k views Asked by OLIVER.KOO At 31 August 2017 at 21:26

I know Java compiler replace all type parameters in generic types with their bounds or Object if the type parameters are unbounded during the process of Type Erasure. The produced machine bytecode would reflect the replaced bounds or Object.

Is there a way to take the resulted machine bytecode and decompile it back to a Java file that contains the original type parameters in generic types? Does there exist a decompiler that can achieve this? Or this process is simply irreversible due to the nature of compiling process?

Original Q&A

There are 3 answers

Antimony On 31 August 2017 at 23:59

That depends mostly on whether the code has been obfuscated. While it is true that generics use type erasure, compilers typically include source level information such as generic types as metadata in the classfile for various reasons - reflection, debugging, compilation against closed source libraries, etc.

So for a well behaved classfile, it should be possible to get the information back. Whether there are any off the shelf tools for this, I don't know. A lot of decompilers do try to recover generic types, but I don't know how reliable they are.

If the code has been obfuscated, then all the metadata will be stripped out, so there is no hope of recovering the original generic types.

Mangesh kalwale On 31 August 2017 at 21:48

Yes ,this is called as decompilation process to convert machine code or we can say it as byte code to its original source code but till some extent ! There are some decompilers are do exist!
What you need is to get some help of decompilers and put little bit of your efforts in order to convert this byte code to its generic type as you said. But it's not possible to do such reverse engineering process with high accuracy ratio as the modern compilers are designed in such a way that they go through several steps in order to convert that source code to its machine code so what you can get back after reversing is a just Non human readable form assembly code but the same work can be done easily till some extent with the help of decompilers. "The java decompiler project " or JD project is the thing what I talk about http://jd.benow.ca Hope it makes your concept clear!

**Mike Strobel** · Accepted Answer · 2017-09-01T03:22:48+00:00

You are correct that, at the bytecode level, much information gets lost when you define and interact with generic types. Type erasure was nice for preserving compatibility: if you mostly enforce type safety at compile time, you don't need to do much at runtime, so you can reduce generic types to their 'raw' equivalents.

And that's the key: compile time verification. If you want the flexibility and type safety of generics, your compiler has to know a lot about the generic types you interact with. In many cases, you won't have the source code for those classes, so it has to get the information from somewhere. And it does: metadata. Embedded in the .class file alongside the bytecode is wealth of information: everything the compiler needs to know you're using generic library types safely. So what kind of generics information gets preserved?

Type variables and constraints

The most basic thing a compiler needs to know in order to consume a generic type is the list of type variables. For any generic type or generic method, the names and positions of the type variables are preserved. Moreover, any constraints (upper or lower bounds) get included as well.

Generic supertype signatures

Sometimes you write a class that extends a generic class or implements a generic interface. If you write a StringList that extends ArrayList<String>, you inherit a lot of functionality. If someone wants to use your StringList as intended and without the source code, it's not enough for the compiler to know that you extended ArrayList; it has to know you extended ArrayList<String>. This applies transitively up the hierarchy: it has to know ArrayList<> extends AbstractList<>, and so on. So this information gets preserved. Your class file a will include the complete generic signatures of any generic supertypes (classes or interfaces).

Member signatures

The compiler can't verify that you're using a generic type correctly if it doesn't know the full generic types of fields, method parameters and return types. So, you guessed it: that information gets included. If any part of a class member contains a generic type, wildcard, or type variable, that member will get its signature information saved in the metadata.

Local variables

It's not necessary to preserve information about local variable types in order to consume a type. It can be useful for debugging, but that's about it. There are metadata tables that can be used to record the names and types of variables, and the bytecode ranges at which they exist. Depending on the compiler, they may or may not be written by default. You can force javac to emit them by passing -g:vars, but I believe they're omitted by default

Call sites

One of the biggest issues for decompilers, mostly affecting generic inference within method bodies, is that call sites invoking generic methods retain no information about type arguments. That creates huge headaches for APIs like Java 8 Streams, where generic operators get chained together, each one accepting anonymously typed lambdas (which may be contravariant in their argument types and covariant in their return types). That's a type inference nightmare, but it's an issue for any code that happens to interact with generics. That kind of code doesn't become substantially harder to decompile simply because it exists within a generic type.

How this affects decompilation

Modern Java decompilers like Procyon and CFR should be able to reconstruct generic types reasonably well. If the local variable metadata is available, the results should be pretty close to the original code. If not, they'll have to try to infer generic type arguments in method bodies based on data flow analysis. Essentially, the decompiler must look at what data flows in and out of generic instantiations, and use what it knows about the type of that data to guess the type arguments. Sometimes it works really well; other times, not so much (see earlier comment about Java 8 Streams).

At the API level, though—type and member signatures—the results should be spot-on.

Caveats

Strictly speaking, all of the metadata described here is optional: it's only needed at compile time (or decompile time). If someone has run their compiled classes through an obfuscator, optimizer, or some other utility, all of this information could get stripped out. It won't make a difference at runtime.

tldr; Conclusion

Yes, it is certainly possible to decompile generic types and methods with their type parameters intact. Assuming the required metadata is present, getting the type and member signatures right is the 'easy' part. Correctly inferring the type arguments of generic instances and method invocations is the tricky bit, but that's a problem for any code that happens to interact with generics.

As mentioned, Procyon and CFR should both do a pretty decent job of restoring generic types and methods.

TechQA.

Is it possible to decompile Java bytecode back to original generic type parameters

There are 3 answers

Related Questions in JAVA

Related Questions in GENERICS

Related Questions in BYTECODE

Related Questions in DECOMPILER

Popular Questions

Popular Tags

Trending Questions