Occasional java.lang.VerifyError on application startup or workflow execution

128 views Asked by At

In an application that has been stable for several months, we recently started seeing several cases where a java.lang.VerifyErroris being thrown, sometimes at application startup and sometimes during a workflow execution (after the same application launches successfully).

Here is an example of what we see:

java.lang.VerifyError: Bad type on operand stack
Exception Details:
  Location:
    com/example/XYZController.doSomething()Lorg/springframework/http/ResponseEntity; @383: invokevirtual
  Reason:
    Type 'java/lang/Throwable' (current frame, stack[2]) is not assignable to 'java/lang/Exception'
  Current Frame:
    bci: @383
    flags: { }
    locals: { 'com/example/XYZController', 'java/util/List', 'java/lang/Throwable', top, top, 'org/aspectj/lang/JoinPoint' }
    stack: { 'org/slf4j/Logger', 'java/lang/String', 'java/lang/Throwable' }
  Bytecode:
    0000000: .... .... .... .... .... .... .... ....
    0000010: .... .... .... .... .... .... .... ....
    0000020: .... .... .... .... .... .... .... ....
    0000030: .... .... .... .... .... .... .... ....     
  Exception Handler Table:
    bci [357, 371] => handler: 374
    bci [357, 371] => handler: 374
    bci [357, 371] => handler: 374
  Stackmap Table:
    full_frame(@56,{Object[#231],Top,Top,Top,Top,Top,Top,Object[#624]},{Object[#555],Object[#624],Object[#558],Object[#626]})
    full_frame(@65,{Object[#231],Integer,Top,Top,Top,Top,Top,Object[#624]},{})
    same_frame(@98)
    full_frame(@167,{Object[#231],Object[#232],Object[#233],Object[#306],Integer,Object[#311],Top,Object[#624]},{})
    same_frame(@206)
    full_frame(@209,{Object[#231],Object[#232],Object[#233],Object[#306],Top,Top,Top,Object[#624]},{})
    full_frame(@219,{Object[#231],Object[#232],Object[#233],Object[#306],Integer,Top,Top,Object[#624]},{})
    same_frame_extended(@313)
    same_locals_1_stack_item_frame(@374,Object[#622])
    full_frame(@393,{Object[#231],Object[#232],Object[#233],Object[#306],Top,Top,Top,Object[#624]},{})

The above is just one occurrence but we see this happen on multiple classes at different times.

By analyzing the bytecode using javap, we see that the verify error is always thrown when attempting to load a catch block that has a combination of checked and unchecked exceptions.

Like:

try {
    ...
    ...
} catch (IOException | IllegalArgumentException ex) {
    doSomething(ex.getMessage()); // ex.getMessage() is where the VerifyError points to (shown below)
}

The corresponding javap output is:

383: invokevirtual #32  // Method java/lang/Exception.getMessage:()Ljava/lang/String;

The reference to e.getMessage() and the fact that the catch block is a combination of checked and unchecked exceptions is the only thing common across all the occurrences of the verify error.

I understand that the VerifyError is usually seen when libraries built using different versions of the JDK interact but we have more or less ruled that out, partly because of the intermittent nature of the error.

Also, we use aspectjweaver for load-time weaving, and the fact that it is intermittent led us down the path that there's possibly an issue with aspectjweaver (as noted here: https://bugs.eclipse.org/bugs/show_bug.cgi?id=550705). We use v1.9.2 as well and so we're now evaluating the behavior after upgrading the aspectjweaver to 1.9.6, as suggested in the eclipse thread.

While we're still evaluating this change, we also broke down the exception handler to separate catch blocks for each exception as well.

So far, after the changes above and with the limited number of restarts of the application, we have not seen it crash with the verify error, but there's no way for us to be confident that we have indeed fixed the root cause. So, we'll probably have to let it play out for a while.

In the meantime, I'd like to check if anyone here has seen similar behaviors that they were able to fix with confidence and also try to get the details of the fix.

EDIT (based on questions posed by kriegaex):


Java version being used

Runtime Java version: openjdk version "11.0.10" 2021-01-19 LTS
Compile time Java version: openjdk version "1.8.0_382"

We'll soon be moving to JDK 11 for both compile and run time. This is a product with a lot of legacy code and the pace of upgrades, especially to the JDK versions will be very slow.

The reason for attempting an upgrade from one outdated aspectj version (1.9.2) to another outdated version (1.9.6) is solely based on the eclipse thread that had a very similar problem that we have (https://bugs.eclipse.org/bugs/show_bug.cgi?id=550705). If we know for a fact that this is not the root cause, we'd likely not do the upgrade at all as doing so involves running regression tests on 40 different components owned by teams across the world (mostly automated but there still are quite a few manual tests). Having said that, if we know for a fact that the aspectj versions are completely backward compatible, that may enable us to move to even the latest possible version.

One more data point that I missed providing earlier was that one major change that went into effect in our product recently was the upgrade of Jackson from v2.7.9 to v2.12.7 - there were a lot of (documented) backward incompatibilities here and a lot of legacy code had to be modified to perform this upgrade. There's a theory that this upgrade is exposing problems in the product that were long hidden - but it's just a theory.

I am unable to provide a reproducer because we're not able to reproduce this locally. It's always a coin toss as to whether the error happens or not. Sometimes, it happens at application startup, sometimes after the application starts up successfully, and sometimes never - all with the same codebase.

One more edit


Stackoverflow was wise enough to show this as a related problem: java.lang.VerifyError: Stack map does not match the one at exception handler

This is kind of as close to our problem that I can expect but we've all but ruled out conflicting jackson libraries. But we'll dig in further

0

There are 0 answers