I know Java compiler replace all type parameters in generic types with their bounds or Object
if the type parameters are unbounded during the process of Type Erasure. The produced machine bytecode would reflect the replaced bounds or Object
.
Is there a way to take the resulted machine bytecode and decompile it back to a Java file that contains the original type parameters in generic types? Does there exist a decompiler that can achieve this? Or this process is simply irreversible due to the nature of compiling process?
You are correct that, at the bytecode level, much information gets lost when you define and interact with generic types. Type erasure was nice for preserving compatibility: if you mostly enforce type safety at compile time, you don't need to do much at runtime, so you can reduce generic types to their 'raw' equivalents.
And that's the key: compile time verification. If you want the flexibility and type safety of generics, your compiler has to know a lot about the generic types you interact with. In many cases, you won't have the source code for those classes, so it has to get the information from somewhere. And it does: metadata. Embedded in the
.class
file alongside the bytecode is wealth of information: everything the compiler needs to know you're using generic library types safely. So what kind of generics information gets preserved?Type variables and constraints
The most basic thing a compiler needs to know in order to consume a generic type is the list of type variables. For any generic type or generic method, the names and positions of the type variables are preserved. Moreover, any constraints (upper or lower bounds) get included as well.
Generic supertype signatures
Sometimes you write a class that extends a generic class or implements a generic interface. If you write a
StringList
that extendsArrayList<String>
, you inherit a lot of functionality. If someone wants to use yourStringList
as intended and without the source code, it's not enough for the compiler to know that you extendedArrayList
; it has to know you extendedArrayList<String>
. This applies transitively up the hierarchy: it has to knowArrayList<>
extendsAbstractList<>
, and so on. So this information gets preserved. Your class file a will include the complete generic signatures of any generic supertypes (classes or interfaces).Member signatures
The compiler can't verify that you're using a generic type correctly if it doesn't know the full generic types of fields, method parameters and return types. So, you guessed it: that information gets included. If any part of a class member contains a generic type, wildcard, or type variable, that member will get its signature information saved in the metadata.
Local variables
It's not necessary to preserve information about local variable types in order to consume a type. It can be useful for debugging, but that's about it. There are metadata tables that can be used to record the names and types of variables, and the bytecode ranges at which they exist. Depending on the compiler, they may or may not be written by default. You can force
javac
to emit them by passing-g:vars
, but I believe they're omitted by defaultCall sites
One of the biggest issues for decompilers, mostly affecting generic inference within method bodies, is that call sites invoking generic methods retain no information about type arguments. That creates huge headaches for APIs like Java 8 Streams, where generic operators get chained together, each one accepting anonymously typed lambdas (which may be contravariant in their argument types and covariant in their return types). That's a type inference nightmare, but it's an issue for any code that happens to interact with generics. That kind of code doesn't become substantially harder to decompile simply because it exists within a generic type.
How this affects decompilation
Modern Java decompilers like Procyon and CFR should be able to reconstruct generic types reasonably well. If the local variable metadata is available, the results should be pretty close to the original code. If not, they'll have to try to infer generic type arguments in method bodies based on data flow analysis. Essentially, the decompiler must look at what data flows in and out of generic instantiations, and use what it knows about the type of that data to guess the type arguments. Sometimes it works really well; other times, not so much (see earlier comment about Java 8 Streams).
At the API level, though—type and member signatures—the results should be spot-on.
Caveats
Strictly speaking, all of the metadata described here is optional: it's only needed at compile time (or decompile time). If someone has run their compiled classes through an obfuscator, optimizer, or some other utility, all of this information could get stripped out. It won't make a difference at runtime.
tldr; Conclusion
Yes, it is certainly possible to decompile generic types and methods with their type parameters intact. Assuming the required metadata is present, getting the type and member signatures right is the 'easy' part. Correctly inferring the type arguments of generic instances and method invocations is the tricky bit, but that's a problem for any code that happens to interact with generics.
As mentioned, Procyon and CFR should both do a pretty decent job of restoring generic types and methods.