Differentiating between a JAR and an uber-JAR

147 views Asked by At

I am currently developing a tool that attempts to infer whether a given JAR file is an uber-JAR or a normal JAR.

Among many other heuristics and checks that I am making to determine the "uberness" of a JAR, the one that is currently not very reliable is the one that checks whether in the root level of the JAR, other than the META-INF folder, there is only one folder that contains .class files.

For example, for the ch.qos.logback:logback-classic:1.1.3 library, the root level structure looks the following way:

META-INF/

ch/

org/

The org/ folder containing slf4j components is probably needed for this specific version of logback-classic. Later versions, such as 1.4.0, only contain one root-level folder that has .class files inside.

Another similar situation occurs with the junit:junit:4.13.1 library. This one has the following root-level folder structure:

META-INF/

junit/

org/

In this case, the org/ folder contains a sub-folder junit/ as well, thus those classes are also part of the actual implementation of junit, only having a different package prefix. Although it is not an uber-JAR, my current methodology would flag it as one.

I am looking for a better heuristic that can effectively differentiate between normal JARs and uber-JARs. Any suggestions or improvements to my current approach would be greatly appreciated.

1

There are 1 answers

0
VGR On

One way is to look at each .class file (their format isn’t actually all that complicated), and check for String constants which appear to be class names that aren’t in the .jar file.

private static final Pattern CLASS_NAME = Pattern.compile(
    "\\p{javaJavaIdentifierStart}\\p{javaJavaIdentifierPart}*" +
    "(?:/\\p{javaJavaIdentifierStart}\\p{javaJavaIdentifierPart}*)+");

private static final byte CONSTANT_Utf8 = 1;
private static final byte CONSTANT_Integer = 3;
private static final byte CONSTANT_Float = 4;
private static final byte CONSTANT_Long = 5;
private static final byte CONSTANT_Double = 6;
private static final byte CONSTANT_Class = 7;
private static final byte CONSTANT_String = 8;
private static final byte CONSTANT_Fieldref = 9;
private static final byte CONSTANT_Methodref = 10;
private static final byte CONSTANT_InterfaceMethodref = 11;
private static final byte CONSTANT_NameAndType = 12;
private static final byte CONSTANT_MethodHandle = 15;
private static final byte CONSTANT_MethodType = 16;
private static final byte CONSTANT_Dynamic = 17;
private static final byte CONSTANT_InvokeDynamic = 18;
private static final byte CONSTANT_Module = 19;
private static final byte CONSTANT_Package = 20;

private static boolean lastComponentIsUpperCase(String s) {
    int lastSlash = s.lastIndexOf('/');
    return lastSlash >= 0 && lastSlash < s.length() - 1 &&
        Character.isUpperCase(s.charAt(lastSlash + 1));
}

public Optional<String> findExternalClassReferenceIn(File file)
throws IOException {
    try (JarFile jar =
        new JarFile(file, false, JarFile.OPEN_READ, Runtime.version())) {

        Set<String> classes = new HashSet<>();

        Enumeration<JarEntry> entries;

        entries = jar.entries();
        while (entries.hasMoreElements()) {
            JarEntry entry = entries.nextElement();
            String name = entry.getRealName();
            if (name.endsWith(".class")) {
                classes.add(name.substring(0, name.length() - 6));
            }
        }

        Matcher classNameMatcher = CLASS_NAME.matcher("");

        entries = jar.entries();
        while (entries.hasMoreElements()) {
            JarEntry entry = entries.nextElement();
            String name = entry.getRealName();
            if (!name.endsWith(".class")) {
                continue;
            }

            try (DataInputStream classFile = new DataInputStream(
                new BufferedInputStream(
                    jar.getInputStream(entry)))) {

                int magic = classFile.readInt();
                short minorVersion = classFile.readShort();
                short majorVersion = classFile.readShort();
                int constantPoolSize = classFile.readShort() - 1;
                for (int i = 1; i <= constantPoolSize; i++) {
                    byte tag = classFile.readByte();
                    switch (tag) {
                        case CONSTANT_Utf8:
                            String s = classFile.readUTF();
                            if (!s.startsWith("java") &&
                                !s.startsWith("com/sun/") &&
                                !s.startsWith("jdk/") &&
                                !s.startsWith("org/w3c/") &&
                                !s.startsWith("org/xml/") &&
                                !s.startsWith("org/ietf/jgss") &&
                                lastComponentIsUpperCase(s) &&
                                classNameMatcher.reset(s).matches() &&
                                !classes.contains(s)) {

                                // Found reference to a class name
                                // that isn't in this jar.
                                return Optional.of(s);
                            }
                            break;
                        case CONSTANT_Integer:
                            classFile.readInt();
                            break;
                        case CONSTANT_Float:
                            classFile.readFloat();
                            break;
                        case CONSTANT_Long:
                            classFile.readLong();
                            i++;
                            break;
                        case CONSTANT_Double:
                            classFile.readDouble();
                            i++;
                            break;
                        case CONSTANT_Class:
                            classFile.readShort();
                            break;
                        case CONSTANT_String:
                            classFile.readShort();
                            break;
                        case CONSTANT_Fieldref:
                            classFile.readShort();
                            classFile.readShort();
                            break;
                        case CONSTANT_Methodref:
                            classFile.readShort();
                            classFile.readShort();
                            break;
                        case CONSTANT_InterfaceMethodref:
                            classFile.readShort();
                            classFile.readShort();
                            break;
                        case CONSTANT_NameAndType:
                            classFile.readShort();
                            classFile.readShort();
                            break;
                        case CONSTANT_MethodHandle:
                            classFile.readByte();
                            classFile.readShort();
                            break;
                        case CONSTANT_MethodType:
                            classFile.readShort();
                            break;
                        case CONSTANT_Dynamic:
                            classFile.readShort();
                            classFile.readShort();
                            break;
                        case CONSTANT_InvokeDynamic:
                            classFile.readShort();
                            classFile.readShort();
                            break;
                        case CONSTANT_Module:
                            classFile.readShort();
                            break;
                        case CONSTANT_Package:
                            classFile.readShort();
                            break;
                        default:
                            throw new IOException(
                                "Unknown constant pool tag " + tag);
                    }
                }
            }
        }
    }

    return Optional.empty();
}

This is far from perfect; strings like "N/A" and "DES/ECB/NoPadding" look like class names. It could be refined by assuming package names are entirely lowercase, but… I’m pretty sure not every project adheres to that practice.