ActiveMQ broker stop when zookeeper leader stop

387 views Asked by At

I m working with servicemix and i have create a zookeeper server bundle that start correctly my zkserver. I run 3 servicemix with each one have a zk server bundle so i have 3 zkserver.

I have done : echo stat | nc localhost 2181 2182 and 2183 and i have one leader and 2 followers

If i stop the leader, an election is done and another one become the leader with one follower, if i restart the stopped one, it become a follower.

I thinkZookeeper part is ok.

Now, let see how the embedded servicemix activemq broker works.

I have my 3 zkserver up, and one master broker elected correctly. If i execute karaf command activemq:bstat, i see on 2 servicemix

service@localhost>bstat
Uptime = 26 minutes

and on the master broker:

service@localhost>bstat

destinationName = Consumer.ZOOKEEPER.VirtualTopic.application.downward
destinationType = Queue

destinationName = ActiveMQ.Advisory.TempQueue_ActiveMQ.Advisory.TempTopic
destinationType = Topic

connectorName = openwire
...
...
many other things...
...

Its look like ok.

Now, if i stop my zkserver leader, i see my master broker stopped too and bstat don't show me anything! and no other broker start one the 2 others servicemix.

LOG when zkserver leader stop

in this log we can see the zkserver stop but i can't see when the broker is stopped and why it's stop because when i stop the zkserver leader, another election is done and another one become the leader, so the broker don't need to stop...

2016-12-19 10:23:18,208 | INFO  | l Console Thread | ZookeeperStarter                 | 284 - zookeeper-server - 1.0.0.SNAPSHOT | Shutting down ZooKeeper server
2016-12-19 10:23:18,208 | INFO  | l Console Thread | Leader                           | 94 - org.apache.hadoop.zookeeper - 3.4.6 | Shutting down
2016-12-19 10:23:18,209 | INFO  | l Console Thread | Leader                           | 94 - org.apache.hadoop.zookeeper - 3.4.6 | Shutdown called
java.lang.Exception: shutdown Leader! reason: quorum Peer shutdown
    at     org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:499)[94:org.apache.hadoop.zookeeper:3.4.6]
    at org.apache.zookeeper.server.quorum.QuorumPeer.shutdown(QuorumPeer.java:828)[94:org.apache.hadoop.zookeeper:3.4.6]
    at com.zookeeper.provider.ZookeeperStarter$MyQuorumPeerMain.shutdown(ZookeeperStarter.java:185)[284:zookeeper-server:1.0.0.SNAPSHOT]
    at com.zookeeper.provider.ZookeeperStarter.shutdown(ZookeeperStarter.java:73)[284:zookeeper-server:1.0.0.SNAPSHOT]
    at com.zookeeper.provider.Activator.stop(Activator.java:71)[284:zookeeper-server:1.0.0.SNAPSHOT]
    at org.apache.felix.framework.util.SecureAction.stopActivator(SecureAction.java:667)[org.apache.felix.framework-4.4.1.jar:]
    at org.apache.felix.framework.Felix.stopBundle(Felix.java:2538)[org.apache.felix.framework-4.4.1.jar:]
    at org.apache.felix.framework.BundleImpl.stop(BundleImpl.java:1012)[org.apache.felix.framework-4.4.1.jar:]
    at org.apache.felix.framework.BundleImpl.stop(BundleImpl.java:999)[org.apache.felix.framework-4.4.1.jar:]
    at org.apache.karaf.shell.osgi.StopBundle.doExecute(StopBundle.java:44)[25:org.apache.karaf.shell.osgi:2.4.3]
    at org.apache.karaf.shell.osgi.BundlesCommand.doExecute(BundlesCommand.java:37)[25:org.apache.karaf.shell.osgi:2.4.3]
    at org.apache.karaf.shell.console.OsgiCommandSupport.execute(OsgiCommandSupport.java:38)[23:org.apache.karaf.shell.console:2.4.3]
    at org.apache.felix.gogo.commands.basic.AbstractCommand.execute(AbstractCommand.java:35)[23:org.apache.karaf.shell.console:2.4.3]
    at org.apache.felix.gogo.runtime.CommandProxy.execute(CommandProxy.java:78)[23:org.apache.karaf.shell.console:2.4.3]
    at org.apache.felix.gogo.runtime.Closure.executeCmd(Closure.java:477)[23:org.apache.karaf.shell.console:2.4.3]
    at org.apache.felix.gogo.runtime.Closure.executeStatement(Closure.java:403)[23:org.apache.karaf.shell.console:2.4.3]
    at org.apache.felix.gogo.runtime.Pipe.run(Pipe.java:108)[23:org.apache.karaf.shell.console:2.4.3]
    at org.apache.felix.gogo.runtime.Closure.execute(Closure.java:183)[23:org.apache.karaf.shell.console:2.4.3]
    at org.apache.felix.gogo.runtime.Closure.execute(Closure.java:120)[23:org.apache.karaf.shell.console:2.4.3]
    at org.apache.felix.gogo.runtime.CommandSessionImpl.execute(CommandSessionImpl.java:92)[23:org.apache.karaf.shell.console:2.4.3]
    at org.apache.karaf.shell.console.jline.Console.run(Console.java:195)[23:org.apache.karaf.shell.console:2.4.3]
    at  org.apache.karaf.shell.console.jline.DelayedStarted.run(DelayedStarted.java:79)[23:org.apache.karaf.shell.console:2.4.3]
2016-12-19 10:23:18,210 | INFO  | l Console Thread | NIOServerCnxn                    | 94 - org.apache.hadoop.zookeeper - 3.4.6 | Closed socket connection for client /10.149.58.103:45947 which had sessionid 0x15916627e620001
2016-12-19 10:23:18,210 | INFO  | Thread-68        | Leader                           | 94 - org.apache.hadoop.zookeeper - 3.4.6 | exception while shutting down acceptor: java.net.SocketException: Socket closed
2016-12-19 10:23:18,211 | INFO  | it.sncf.fr:2181) | ClientCnxn                       | 94 - org.apache.hadoop.zookeeper - 3.4.6 | Unable to read additional data from server sessionid 0x15916627e620001, likely server has closed socket, closing socket connection and attempting reconnect
2016-12-19 10:23:18,211 | INFO  | l Console Thread | NIOServerCnxn                    | 94 - org.apache.hadoop.zookeeper - 3.4.6 | Closed socket connection for client /10.149.58.103:45946 which had sessionid 0x15916627e620000
2016-12-19 10:23:18,212 | INFO  | l Console Thread | ZooKeeperServer                  | 94 - org.apache.hadoop.zookeeper - 3.4.6 | shutting down
2016-12-19 10:23:18,212 | INFO  | l Console Thread | SessionTrackerImpl               | 94 - org.apache.hadoop.zookeeper - 3.4.6 | Shutting down
2016-12-19 10:23:18,212 | INFO  | l Console Thread | PrepRequestProcessor             | 94 - org.apache.hadoop.zookeeper - 3.4.6 | Shutting down
2016-12-19 10:23:18,213 | INFO  | l Console Thread | ProposalRequestProcessor         | 94 - org.apache.hadoop.zookeeper - 3.4.6 | Shutting down
2016-12-19 10:23:18,213 | INFO  | l Console Thread | CommitProcessor                  | 94 - org.apache.hadoop.zookeeper - 3.4.6 | Shutting down
 2016-12-19 10:23:18,213 | INFO  | sid:1 cport:-1): | PrepRequestProcessor             | 94 - org.apache.hadoop.zookeeper - 3.4.6 | PrepRequestProcessor exited loop!
 2016-12-19 10:23:18,213 | INFO  | l Console Thread | Leader                           | 94 - org.apache.hadoop.zookeeper - 3.4.6 | Shutting down
 2016-12-19 10:23:18,214 | INFO  | l Console Thread | FinalRequestProcessor            | 94 - org.apache.hadoop.zookeeper - 3.4.6 | shutdown of request processor complete
 2016-12-19 10:23:18,214 | INFO  | ommitProcessor:1 | CommitProcessor                  | 94 - org.apache.hadoop.zookeeper - 3.4.6 | CommitProcessor exited loop!
 2016-12-19 10:23:18,214 | INFO  | l Console Thread | SyncRequestProcessor             | 94 - org.apache.hadoop.zookeeper - 3.4.6 | Shutting down
 2016-12-19 10:23:18,214 | INFO  | SyncThread:1     | SyncRequestProcessor             | 94 - org.apache.hadoop.zookeeper - 3.4.6 | SyncRequestProcessor exited!
 2016-12-19 10:23:18,225 | WARN  | 149.58.103:36413 | LearnerHandler                   | 94 - org.apache.hadoop.zookeeper - 3.4.6 | ******* GOODBYE /10.149.58.103:36413 ********
 2016-12-19 10:23:18,226 | WARN  | 149.58.103:36413 | LearnerHandler                   | 94 - org.apache.hadoop.zookeeper - 3.4.6 | Ignoring unexpected exception
java.lang.InterruptedException
    at  java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)[:1.7.0_71]
    at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)[:1.7.0_71]
    at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)[:1.7.0_71]
    at org.apache.zookeeper.server.quorum.LearnerHandler.shutdown(LearnerHandler.java:656)[94:org.apache.hadoop.zookeeper:3.4.6]
    at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:649)[94:org.apache.hadoop.zookeeper:3.4.6]
 2016-12-19 10:23:18,229 | WARN  | 149.58.103:36408 | LearnerHandler                   | 94 - org.apache.hadoop.zookeeper - 3.4.6 | ******* GOODBYE /10.149.58.103:36408 ********
 2016-12-19 10:23:18,229 | WARN  | 149.58.103:36408 | LearnerHandler                   | 94 - org.apache.hadoop.zookeeper - 3.4.6 | Ignoring unexpected exception 
java.lang.InterruptedException
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)[:1.7.0_71]
    at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)[:1.7.0_71]
    at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)[:1.7.0_71]
    at org.apache.zookeeper.server.quorum.LearnerHandler.shutdown(LearnerHandler.java:656)[94:org.apache.hadoop.zookeeper:3.4.6]
    at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:649)[94:org.apache.hadoop.zookeeper:3.4.6]
2016-12-19 10:23:18,232 | INFO  | 0.0/0.0.0.0:2181 | NIOServerCnxnFactory             | 94 - org.apache.hadoop.zookeeper - 3.4.6 | NIOServerCnxn factory exited run method
2016-12-19 10:23:18,233 | WARN  | 0:0:0:0:0:0:2181 | QuorumPeer                       | 94 - org.apache.hadoop.zookeeper - 3.4.6 | Unexpected exception
java.lang.InterruptedException: sleep interrupted
    at java.lang.Thread.sleep(Native Method)[:1.7.0_71]
    at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:453)[94:org.apache.hadoop.zookeeper:3.4.6]
    at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:799)[94:org.apache.hadoop.zookeeper:3.4.6]
2016-12-19 10:23:18,233 | INFO  | 0:0:0:0:0:0:2181 | Leader                           | 94 - org.apache.hadoop.zookeeper - 3.4.6 | Shutting down
2016-12-19 10:23:18,234 | WARN  | 0:0:0:0:0:0:2181 | QuorumPeer                       | 94 - org.apache.hadoop.zookeeper - 3.4.6 | QuorumPeer main thread exited
2016-12-19 10:23:18,235 | ERROR | .149.58.103:3888 | QuorumCnxManager                 | 94 - org.apache.hadoop.zookeeper - 3.4.6 | Exception while listening
java.net.SocketException: Socket closed
    at java.net.PlainSocketImpl.socketAccept(Native Method)[:1.7.0_71]
    at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)[:1.7.0_71]
    at java.net.ServerSocket.implAccept(ServerSocket.java:530)[:1.7.0_71]
    at java.net.ServerSocket.accept(ServerSocket.java:498)[:1.7.0_71]
    at org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:509)[94:org.apache.hadoop.zookeeper:3.4.6]
2016-12-19 10:23:18,237 | WARN  | RecvWorker:3     | QuorumCnxManager                 | 94 - org.apache.hadoop.zookeeper - 3.4.6 | Connection broken for id 3, my id = 1, error =
java.net.SocketException: Socket closed
    at java.net.SocketInputStream.socketRead0(Native Method)[:1.7.0_71]
    at java.net.SocketInputStream.read(SocketInputStream.java:152)[:1.7.0_71]
    at java.net.SocketInputStream.read(SocketInputStream.java:122)[:1.7.0_71]
    at java.net.SocketInputStream.read(SocketInputStream.java:210)[:1.7.0_71]
    at java.io.DataInputStream.readInt(DataInputStream.java:387)[:1.7.0_71]
    at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:765)[94:org.apache.hadoop.zookeeper:3.4.6]
2016-12-19 10:23:18,237 | WARN  | RecvWorker:3     | QuorumCnxManager                 | 94 - org.apache.hadoop.zookeeper - 3.4.6 | Interrupting SendWorker
2016-12-19 10:23:18,238 | WARN  | SendWorker:2     | QuorumCnxManager                 | 94 - org.apache.hadoop.zookeeper - 3.4.6 | Interrupted while waiting for message on queue
java.lang.InterruptedException
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)[:1.7.0_71]
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2095)[:1.7.0_71]
    at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:389)[:1.7.0_71]
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:849)[94:org.apache.hadoop.zookeeper:3.4.6]
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:64)[94:org.apache.hadoop.zookeeper:3.4.6]
    at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:685)[94:org.apache.hadoop.zookeeper:3.4.6]
2016-12-19 10:23:18,239 | WARN  | RecvWorker:2     | QuorumCnxManager                 | 94 - org.apache.hadoop.zookeeper - 3.4.6 | Connection broken for id 2, my id = 1, error =
java.net.SocketException: Socket closed
    at java.net.SocketInputStream.socketRead0(Native Method)[:1.7.0_71]
    at java.net.SocketInputStream.read(SocketInputStream.java:152)[:1.7.0_71]
    at java.net.SocketInputStream.read(SocketInputStream.java:122)[:1.7.0_71]
    at java.net.SocketInputStream.read(SocketInputStream.java:210)[:1.7.0_71]
    at java.io.DataInputStream.readInt(DataInputStream.java:387)[:1.7.0_71]
    at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:765)[94:org.apache.hadoop.zookeeper:3.4.6]
2016-12-19 10:23:18,239 | WARN  | RecvWorker:2     | QuorumCnxManager                 | 94 - org.apache.hadoop.zookeeper - 3.4.6 | Interrupting SendWorker
2016-12-19 10:23:18,240 | WARN  | SendWorker:2     | QuorumCnxManager                 | 94 - org.apache.hadoop.zookeeper - 3.4.6 | Send worker leaving thread
2016-12-19 10:23:18,241 | WARN  | SendWorker:3     | QuorumCnxManager                 | 94 - org.apache.hadoop.zookeeper - 3.4.6 | Interrupted while waiting for message on queue
java.lang.InterruptedException
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)[:1.7.0_71]
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2095)[:1.7.0_71]
    at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:389)[:1.7.0_71]
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:849)[94:org.apache.hadoop.zookeeper:3.4.6]
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:64)[94:org.apache.hadoop.zookeeper:3.4.6]
    at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:685)[94:org.apache.hadoop.zookeeper:3.4.6]
2016-12-19 10:23:18,241 | WARN  | SendWorker:3     | QuorumCnxManager                 | 94 - org.apache.hadoop.zookeeper - 3.4.6 | Send worker leaving thread
2016-12-19 10:23:18,321 | WARN  | spatch-DEFAULT-2 | MasterLevelDBStore               | 96 - org.apache.activemq.activemq-osgi - 5.12.1 | Unexpected session error: java.io.IOException: Connection reset by peer
2016-12-19 10:23:18,330 | WARN  | spatch-DEFAULT-3 |  MasterLevelDBStore               | 96 - org.apache.activemq.activemq-osgi - 5.12.1 | Unexpected session error: java.io.IOException: Connection reset by peer
2016-12-19 10:23:18,583 | INFO  | -broker] Task-14 | ZooKeeper                        | 94 - org.apache.hadoop.zookeeper - 3.4.6 | Session: 0x15916627e620001 closed
  2016-12-19 10:23:18,584 | INFO  | er])-EventThread | ClientCnxn                       | 94 - org.apache.hadoop.zookeeper - 3.4.6 | EventThread shut down
2016-12-19 10:23:19,236 | INFO  | .149.58.103:3888 | QuorumCnxManager                 | 94 - org.apache.hadoop.zookeeper - 3.4.6 | Leaving listener
2016-12-19 10:23:19,803 | INFO  | Receiver[myid=1] | FastLeaderElection               | 94 - org.apache.hadoop.zookeeper - 3.4.6 | WorkerReceiver is down
2016-12-19 10:23:19,803 | INFO  | erSender[myid=1] | FastLeaderElection               | 94 - org.apache.hadoop.zookeeper - 3.4.6 | WorkerSender is down
2016-12-19 10:23:20,000 | INFO  | SessionTracker   | SessionTrackerImpl               | 94 - org.apache.hadoop.zookeeper - 3.4.6 | SessionTrackerImpl exited loop!

Zk server configuration

Only myid change to 1 / 2 / 3 and karaf.data is different because myid file in dataDir is unique and contain 1 or 2 or 3

tickTime=2000    
initLimit=10
syncLimit=10
clientPort=2183
server.1=localhost:2888:3888
server.2=localhost:2889:3889
server.3=localhost:2890:3890
myId=3
dataDir=${karaf.data}/zookeeper/

Activemq replicated leveldb configuration

 <persistenceAdapter>
    <replicatedLevelDB
        directory="${data}"
        replicas="3"
        bind="tcp://0.0.0.0:61622"
        zkAddress="localhost:2181,localhost:2182,localhost:2183"
        zkPassword="password"
        zkPath="/activemq/leveldb-stores"
        hostname="localhost"
        />
    </persistenceAdapter>

version

Servicemix 5.5.4 ActiveMQ : bundle osgi : activemq-osgi (5.12.1) Zookeeper : bundle osgi : ZooKeeper Bundle (3.4.6)

thanks for help to understand why my master broker stop and why no other broker start after zk server leader election is done.

0

There are 0 answers