Linked Questions

Popular Questions

I'm doing some experiments with clustered Vertx (vertx 3.6.0 and vertx-hazelcast 3.6.2). For the moment I implemented a toy distributed registry {String => String} running on few nodes with basic operations (set, get, del) that I trigger by HTTP. For example curl http://localhost:8081/set/key1/val1 requests node listening on port 8081 to store the value val1 under the key var1. The data store for the registry is an AsyncMap that Vertx provides by executing vertx.sharedData().<String, String>getAsyncMap(...). So far I didn't configure anything related to Hazelcast and, thus, I'm using the default configuration that comes with the library. Test cases like node failure and recovery, addition of new nodes, simultaneous addition of several nodes, etc. work fine.

One of my test cases is the application update with no downtime. It is as follows:

  1. Have some nodes running behind an HTTP load balancer that will distribute the requests on the nodes
  2. Update the application (add some functionality and recompile the application. For the sake of the example I implemented a keys function that returns the list of keys defined in the registry).
  3. Remove node_1 from the load balancer and stop it.
  4. Restart node_1 with the new fat-jar, let it join the cluster and re-synchronize the map, and re-enable it in the load balancer.
  5. Do the same for all nodes.
  6. Enable the clients to use the new functionality.

The issue happens at step 4 : the updated node restart most of the time without any error logged but I get errors and exceptions on the nodes that had not been updated yet and if this happens, the map on the new node is not synchronized (empty) and the old nodes do not respond to the HTTP requests anymore. This is not systematic, sometimes everything goes fine, but most of the time, boom!

For example:

févr. 03, 2019 7:56:54 AM com.hazelcast.internal.partition.operation.MigrationRequestOperation
WARNING: [192.168.8.149]:5701 [dev] [3.10.5] Error while executing beforeMigration()
java.lang.NoClassDefFoundError: com/hazelcast/map/impl/querycache/publisher/AccumulatorSweeper
    at com.hazelcast.map.impl.MapMigrationAwareService.flushAndRemoveQueryCaches(MapMigrationAwareService.java:104)
    at com.hazelcast.map.impl.MapMigrationAwareService.beforeMigration(MapMigrationAwareService.java:88)
    at com.hazelcast.spi.impl.CountingMigrationAwareService.beforeMigration(CountingMigrationAwareService.java:76)
    at com.hazelcast.map.impl.MapService.beforeMigration(MapService.java:147)
    at com.hazelcast.internal.partition.operation.BaseMigrationOperation.executeBeforeMigrations(BaseMigrationOperation.java:180)
    at com.hazelcast.internal.partition.operation.MigrationRequestOperation.executeBeforeMigrations(MigrationRequestOperation.java:186)
    at com.hazelcast.internal.partition.operation.MigrationRequestOperation.run(MigrationRequestOperation.java:216)
    at com.hazelcast.spi.Operation.call(Operation.java:148)
    at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.call(OperationRunnerImpl.java:202)
    at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:191)
    at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:405)
    at com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.process(OperationThread.java:115)
    at com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.run(OperationThread.java:100)

Another example of what I get while executing the same process with the same source code:

févr. 03, 2019 8:24:30 AM com.hazelcast.nio.tcp.TcpIpConnectionManager
INFO: [192.168.8.149]:5701 [dev] [3.10.5] Established socket connection between /192.168.8.149:5701 and /192.168.8.149:34634
févr. 03, 2019 8:24:32 AM com.hazelcast.internal.networking.nio.iobalancer.IOBalancer
SEVERE: [192.168.8.149]:5701 [dev] [3.10.5] IOBalancer failed
java.lang.NoClassDefFoundError: com/hazelcast/internal/networking/nio/NioInboundPipeline$StartMigrationTask
    at com.hazelcast.internal.networking.nio.NioInboundPipeline.requestMigration(NioInboundPipeline.java:115)
    at com.hazelcast.internal.networking.nio.iobalancer.IOBalancer.tryMigrate(IOBalancer.java:212)
    at com.hazelcast.internal.networking.nio.iobalancer.IOBalancer.scheduleMigrationIfNeeded(IOBalancer.java:153)
    at com.hazelcast.internal.networking.nio.iobalancer.IOBalancer.checkInboundPipelines(IOBalancer.java:146)
    at com.hazelcast.internal.networking.nio.iobalancer.IOBalancerThread.run(IOBalancerThread.java:50)
Caused by: java.lang.ClassNotFoundException: com.hazelcast.internal.networking.nio.NioInboundPipeline$StartMigrationTask
    at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:583)
    at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
    ... 5 more

Any idea ?

Related Questions