I am currently developing a tool that attempts to infer whether a given JAR file is an uber-JAR or a normal JAR.
Among many other heuristics and checks that I am making to determine the "uberness" of a JAR, the one that is currently not very reliable is the one that checks whether in the root level of the JAR, other than the META-INF folder, there is only one folder that contains .class files.
For example, for the ch.qos.logback:logback-classic:1.1.3 library, the root level structure looks the following way:
META-INF/
ch/
org/
The org/ folder containing slf4j components is probably needed for this specific version of logback-classic. Later versions, such as 1.4.0, only contain one root-level folder that has .class files inside.
Another similar situation occurs with the junit:junit:4.13.1 library. This one has the following root-level folder structure:
META-INF/
junit/
org/
In this case, the org/ folder contains a sub-folder junit/ as well, thus those classes are also part of the actual implementation of junit, only having a different package prefix. Although it is not an uber-JAR, my current methodology would flag it as one.
I am looking for a better heuristic that can effectively differentiate between normal JARs and uber-JARs. Any suggestions or improvements to my current approach would be greatly appreciated.
One way is to look at each .class file (their format isn’t actually all that complicated), and check for String constants which appear to be class names that aren’t in the .jar file.
This is far from perfect; strings like "N/A" and "DES/ECB/NoPadding" look like class names. It could be refined by assuming package names are entirely lowercase, but… I’m pretty sure not every project adheres to that practice.