I have a cluster of Artemis in Kubernetes with 3 group of master/slave:
activemq-artemis-master-0 1/1 Running
activemq-artemis-master-1 1/1 Running
activemq-artemis-master-2 1/1 Running
activemq-artemis-slave-0 0/1 Running
activemq-artemis-slave-1 0/1 Running
activemq-artemis-slave-2 0/1 Running
I am using Spring boot JmsListener to consume messages sent to a wildcard queue as follow.
@Component
@Log4j2
public class QueueListener {
@Autowired
private ListenerControl listenerControl;
@JmsListener(id = "queueListener0", destination = "QUEUE.service2.*.*.*.notification")
public void add(String message, @Header("sentBy") String sentBy, @Header("sentFrom") String sentFrom, @Header("sentAt") Long sentAt) throws InterruptedException {
log.info("---QUEUE[notification]: message={}, sentBy={}, sentFrom={}, sentAt={}",
message, sentBy, sentFrom, sentAt);
TimeUnit.MILLISECONDS.sleep(listenerControl.getDuration());
}
}
There was 20 messages sent to the queue and master-1 was the delivering node. When 5 messages has been consumed, I killed the master-1 node to simulate a crash, I saw slave-1 started running then yielded back to master-1 after Kubernetes respawn it. The listener threw a JMSException
that the connection was lost and it tried to reconnect. Then I saw it successfully connected to master-0 (I saw the queue created and the consumer count > 0). However the queue on master-0 was empty, while the same queue in master-1 still had 15 messages and no consumer attached to it. I waited for a while but the 15 messages was never delivered. I am not sure why redistribution did not kick in.
The attributes of the wildcard queue on master-1 is like this when it came back online after the crash (I manually replace the value of the field accessToken since it has sensitive info):
Attribute Value
Acknowledge attempts 0
Address QUEUE.service2.*.*.*.notification
Configuration managed false
Consumer count 0
Consumers before dispatch 0
Dead letter address DLQ
Delay before dispatch -1
Delivering count 0
Delivering size 0
Durable true
Durable delivering count 0
Durable delivering size 0
Durable message count 15
Durable persistent size 47705
Durable scheduled count 0
Durable scheduled size 0
Enabled true
Exclusive false
Expiry address ExpiryQueue
Filter
First message age 523996
First message as json [{"JMSType":"service2","address":"QUEUE.service2.tech-drive2.188100000059.thai.notification","messageID":68026,"sentAt":1621957145988,"accessToken":"REMOVED","type":3,"priority":4,"userID":"ID:56c7b509-bd6f-11eb-a348-de0dacf99072","_AMQ_GROUP_ID":"tech-drive2-188100000059-thai","sentBy":"[email protected]","durable":true,"JMSReplyTo":"queue://QUEUE.service2.tech-drive2.188100000059.thai.notification","__AMQ_CID":"e4469ea3-bd62-11eb-a348-de0dacf99072","sentFrom":"service2","originalDestination":"QUEUE.service2.tech-drive2.188100000059.thai.notification","_AMQ_ROUTING_TYPE":1,"JMSCorrelationID":"c329c733-1170-440a-9080-992a009d87a9","expiration":0,"timestamp":1621957145988}]
First message timestamp 1621957145988
Group buckets -1
Group count 0
Group first key
Group rebalance false
Group rebalance pause dispatch false
Id 119
Last value false
Last value key
Max consumers -1
Message count 15
Messages acknowledged 0
Messages added 15
Messages expired 0
Messages killed 0
Name QUEUE.service2.*.*.*.notification
Object Name org.apache.activemq.artemis:broker="activemq-artemis-master-1",component=addresses,address="QUEUE.service2.\*.\*.\*.notification",subcomponent=queues,routing-type="anycast",queue="QUEUE.service2.\*.\*.\*.notification"
Paused false
Persistent size 47705
Prepared transaction message count 0
Purge on no consumers false
Retroactive resource false
Ring size -1
Routing type ANYCAST
Scheduled count 0
Scheduled size 0
Temporary false
User f7bcdaed-8c0c-4bb5-ad03-ec06382cb557
The attributes of the wildcard queue on master-0 is like this:
Attribute Value
Acknowledge attempts 0
Address QUEUE.service2.*.*.*.notification
Configuration managed false
Consumer count 3
Consumers before dispatch 0
Dead letter address DLQ
Delay before dispatch -1
Delivering count 0
Delivering size 0
Durable true
Durable delivering count 0
Durable delivering size 0
Durable message count 0
Durable persistent size 0
Durable scheduled count 0
Durable scheduled size 0
Enabled true
Exclusive false
Expiry address ExpiryQueue
Filter
First message age
First message as json [{}]
First message timestamp
Group buckets -1
Group count 0
Group first key
Group rebalance false
Group rebalance pause dispatch false
Id 119
Last value false
Last value key
Max consumers -1
Message count 0
Messages acknowledged 0
Messages added 0
Messages expired 0
Messages killed 0
Name QUEUE.service2.*.*.*.notification
Object Name org.apache.activemq.artemis:broker="activemq-artemis-master-0",component=addresses,address="QUEUE.service2.\*.\*.\*.notification",subcomponent=queues,routing-type="anycast",queue="QUEUE.service2.\*.\*.\*.notification"
Paused false
Persistent size 0
Prepared transaction message count 0
Purge on no consumers false
Retroactive resource false
Ring size -1
Routing type ANYCAST
Scheduled count 0
Scheduled size 0
Temporary false
User f7bcdaed-8c0c-4bb5-ad03-ec06382cb557
The Artemis version in use is 2.17.0. Here is my cluster config in master-0 broker.xml
. The configs are the same for other brokers except the connector-ref
is changed to match the broker:
<?xml version="1.0"?>
<configuration xmlns="urn:activemq" xmlns:xi="http://www.w3.org/2001/XInclude" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:activemq /schema/artemis-configuration.xsd">
<core xmlns="urn:activemq:core" xsi:schemaLocation="urn:activemq:core ">
<name>activemq-artemis-master-0</name>
<persistence-enabled>true</persistence-enabled>
<journal-type>ASYNCIO</journal-type>
<paging-directory>data/paging</paging-directory>
<bindings-directory>data/bindings</bindings-directory>
<journal-directory>data/journal</journal-directory>
<large-messages-directory>data/large-messages</large-messages-directory>
<journal-datasync>true</journal-datasync>
<journal-min-files>2</journal-min-files>
<journal-pool-files>10</journal-pool-files>
<journal-device-block-size>4096</journal-device-block-size>
<journal-file-size>10M</journal-file-size>
<journal-buffer-timeout>100000</journal-buffer-timeout>
<journal-max-io>4096</journal-max-io>
<disk-scan-period>5000</disk-scan-period>
<max-disk-usage>90</max-disk-usage>
<critical-analyzer>true</critical-analyzer>
<critical-analyzer-timeout>120000</critical-analyzer-timeout>
<critical-analyzer-check-period>60000</critical-analyzer-check-period>
<critical-analyzer-policy>HALT</critical-analyzer-policy>
<page-sync-timeout>2244000</page-sync-timeout>
<acceptors>
<acceptor name="artemis">tcp://0.0.0.0:61616?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;amqpMinLargeMessageSize=102400;protocols=CORE,AMQP,STOMP,HORNETQ,MQTT,OPENWIRE;useEpoll=true;amqpCredits=1000;amqpLowCredits=300;amqpDuplicateDetection=true</acceptor>
<acceptor name="amqp">tcp://0.0.0.0:5672?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=AMQP;useEpoll=true;amqpCredits=1000;amqpLowCredits=300;amqpMinLargeMessageSize=102400;amqpDuplicateDetection=true</acceptor>
<acceptor name="stomp">tcp://0.0.0.0:61613?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=STOMP;useEpoll=true</acceptor>
<acceptor name="hornetq">tcp://0.0.0.0:5445?anycastPrefix=jms.queue.;multicastPrefix=jms.topic.;protocols=HORNETQ,STOMP;useEpoll=true</acceptor>
<acceptor name="mqtt">tcp://0.0.0.0:1883?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=MQTT;useEpoll=true</acceptor>
</acceptors>
<security-settings>
<security-setting match="#">
<permission type="createNonDurableQueue" roles="amq"/>
<permission type="deleteNonDurableQueue" roles="amq"/>
<permission type="createDurableQueue" roles="amq"/>
<permission type="deleteDurableQueue" roles="amq"/>
<permission type="createAddress" roles="amq"/>
<permission type="deleteAddress" roles="amq"/>
<permission type="consume" roles="amq"/>
<permission type="browse" roles="amq"/>
<permission type="send" roles="amq"/>
<permission type="manage" roles="amq"/>
</security-setting>
</security-settings>
<address-settings>
<address-setting match="activemq.management#">
<dead-letter-address>DLQ</dead-letter-address>
<expiry-address>ExpiryQueue</expiry-address>
<redelivery-delay>0</redelivery-delay>
<!-- with -1 only the global-max-size is in use for limiting -->
<max-size-bytes>-1</max-size-bytes>
<message-counter-history-day-limit>10</message-counter-history-day-limit>
<address-full-policy>PAGE</address-full-policy>
<auto-create-queues>true</auto-create-queues>
<auto-create-addresses>true</auto-create-addresses>
<auto-create-jms-queues>true</auto-create-jms-queues>
<auto-create-jms-topics>true</auto-create-jms-topics>
</address-setting>
<address-setting match="#">
<dead-letter-address>DLQ</dead-letter-address>
<expiry-address>ExpiryQueue</expiry-address>
<redistribution-delay>60000</redistribution-delay>
<redelivery-delay>0</redelivery-delay>
<max-size-bytes>-1</max-size-bytes>
<message-counter-history-day-limit>10</message-counter-history-day-limit>
<address-full-policy>PAGE</address-full-policy>
<auto-create-queues>true</auto-create-queues>
<auto-create-addresses>true</auto-create-addresses>
<auto-create-jms-queues>true</auto-create-jms-queues>
<auto-create-jms-topics>true</auto-create-jms-topics>
</address-setting>
</address-settings>
<addresses>
<address name="DLQ">
<anycast>
<queue name="DLQ"/>
</anycast>
</address>
<address name="ExpiryQueue">
<anycast>
<queue name="ExpiryQueue"/>
</anycast>
</address>
</addresses>
<cluster-user>clusterUser</cluster-user>
<cluster-password>aShortclusterPassword</cluster-password>
<connectors>
<connector name="activemq-artemis-master-0">tcp://activemq-artemis-master-0.activemq-artemis-master.svc.cluster.local:61616</connector>
<connector name="activemq-artemis-slave-0">tcp://activemq-artemis-slave-0.activemq-artemis-slave.svc.cluster.local:61616</connector>
<connector name="activemq-artemis-master-1">tcp://activemq-artemis-master-1.activemq-artemis-master.svc.cluster.local:61616</connector>
<connector name="activemq-artemis-slave-1">tcp://activemq-artemis-slave-1.activemq-artemis-slave.svc.cluster.local:61616</connector>
<connector name="activemq-artemis-master-2">tcp://activemq-artemis-master-2.activemq-artemis-master.svc.cluster.local:61616</connector>
<connector name="activemq-artemis-slave-2">tcp://activemq-artemis-slave-2.activemq-artemis-slave.svc.cluster.local:61616</connector>
</connectors>
<cluster-connections>
<cluster-connection name="activemq-artemis">
<connector-ref>activemq-artemis-master-0</connector-ref>
<retry-interval>500</retry-interval>
<retry-interval-multiplier>1.1</retry-interval-multiplier>
<max-retry-interval>5000</max-retry-interval>
<initial-connect-attempts>-1</initial-connect-attempts>
<reconnect-attempts>-1</reconnect-attempts>
<message-load-balancing>ON_DEMAND</message-load-balancing>
<max-hops>1</max-hops>
<!-- scale-down>true</scale-down -->
<static-connectors>
<connector-ref>activemq-artemis-master-0</connector-ref>
<connector-ref>activemq-artemis-slave-0</connector-ref>
<connector-ref>activemq-artemis-master-1</connector-ref>
<connector-ref>activemq-artemis-slave-1</connector-ref>
<connector-ref>activemq-artemis-master-2</connector-ref>
<connector-ref>activemq-artemis-slave-2</connector-ref>
</static-connectors>
</cluster-connection>
</cluster-connections>
<ha-policy>
<replication>
<master>
<group-name>activemq-artemis-0</group-name>
<quorum-vote-wait>12</quorum-vote-wait>
<vote-on-replication-failure>true</vote-on-replication-failure>
<!--we need this for auto failback-->
<check-for-live-server>true</check-for-live-server>
</master>
</replication>
</ha-policy>
</core>
<core xmlns="urn:activemq:core">
<jmx-management-enabled>true</jmx-management-enabled>
</core>
</configuration>
From another answer from Stack Overflow, I understand that my topology for high-availability is redundant and I am planning to remove the slave. However, I don't think the slave is the cause for redistribution of messages not working. Is there a config that I am missing to handle Artemis node crash?
Updated 1: As Justin suggested, I tried to use a cluster of 2 nodes of Artemis without HA.
activemq-artemis-master-0 1/1 Running 0 27m
activemq-artemis-master-1 1/1 Running 0 74s
The following is broker.xml of the 2 artemis node. The only different between them is the node name and journal-buffer-timeout:
<?xml version="1.0"?>
<configuration xmlns="urn:activemq" xmlns:xi="http://www.w3.org/2001/XInclude" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:activemq /schema/artemis-configuration.xsd">
<core xmlns="urn:activemq:core" xsi:schemaLocation="urn:activemq:core ">
<name>activemq-artemis-master-0</name>
<persistence-enabled>true</persistence-enabled>
<journal-type>ASYNCIO</journal-type>
<paging-directory>data/paging</paging-directory>
<bindings-directory>data/bindings</bindings-directory>
<journal-directory>data/journal</journal-directory>
<large-messages-directory>data/large-messages</large-messages-directory>
<journal-datasync>true</journal-datasync>
<journal-min-files>2</journal-min-files>
<journal-pool-files>10</journal-pool-files>
<journal-device-block-size>4096</journal-device-block-size>
<journal-file-size>10M</journal-file-size>
<journal-buffer-timeout>100000</journal-buffer-timeout>
<journal-max-io>4096</journal-max-io>
<disk-scan-period>5000</disk-scan-period>
<max-disk-usage>90</max-disk-usage>
<critical-analyzer>true</critical-analyzer>
<critical-analyzer-timeout>120000</critical-analyzer-timeout>
<critical-analyzer-check-period>60000</critical-analyzer-check-period>
<critical-analyzer-policy>HALT</critical-analyzer-policy>
<page-sync-timeout>2244000</page-sync-timeout>
<acceptors>
<acceptor name="artemis">tcp://0.0.0.0:61616?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;amqpMinLargeMessageSize=102400;protocols=CORE,AMQP,STOMP,HORNETQ,MQTT,OPENWIRE;useEpoll=true;amqpCredits=1000;amqpLowCredits=300;amqpDuplicateDetection=true</acceptor>
<acceptor name="amqp">tcp://0.0.0.0:5672?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=AMQP;useEpoll=true;amqpCredits=1000;amqpLowCredits=300;amqpMinLargeMessageSize=102400;amqpDuplicateDetection=true</acceptor>
<acceptor name="stomp">tcp://0.0.0.0:61613?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=STOMP;useEpoll=true</acceptor>
<acceptor name="hornetq">tcp://0.0.0.0:5445?anycastPrefix=jms.queue.;multicastPrefix=jms.topic.;protocols=HORNETQ,STOMP;useEpoll=true</acceptor>
<acceptor name="mqtt">tcp://0.0.0.0:1883?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=MQTT;useEpoll=true</acceptor>
</acceptors>
<security-settings>
<security-setting match="#">
<permission type="createNonDurableQueue" roles="amq"/>
<permission type="deleteNonDurableQueue" roles="amq"/>
<permission type="createDurableQueue" roles="amq"/>
<permission type="deleteDurableQueue" roles="amq"/>
<permission type="createAddress" roles="amq"/>
<permission type="deleteAddress" roles="amq"/>
<permission type="consume" roles="amq"/>
<permission type="browse" roles="amq"/>
<permission type="send" roles="amq"/>
<permission type="manage" roles="amq"/>
</security-setting>
</security-settings>
<cluster-user>ClusterUser</cluster-user>
<cluster-password>longClusterPassword</cluster-password>
<connectors>
<connector name="activemq-artemis-master-0">tcp://activemq-artemis-master-0.activemq-artemis-master.ncp-stack-testing.svc.cluster.local:61616</connector>
<connector name="activemq-artemis-master-1">tcp://activemq-artemis-master-1.activemq-artemis-master.ncp-stack-testing.svc.cluster.local:61616</connector>
</connectors>
<cluster-connections>
<cluster-connection name="activemq-artemis">
<connector-ref>activemq-artemis-master-0</connector-ref>
<retry-interval>500</retry-interval>
<retry-interval-multiplier>1.1</retry-interval-multiplier>
<max-retry-interval>5000</max-retry-interval>
<initial-connect-attempts>-1</initial-connect-attempts>
<reconnect-attempts>-1</reconnect-attempts>
<use-duplicate-detection>true</use-duplicate-detection>
<message-load-balancing>ON_DEMAND</message-load-balancing>
<max-hops>1</max-hops>
<static-connectors>
<connector-ref>activemq-artemis-master-0</connector-ref>
<connector-ref>activemq-artemis-master-1</connector-ref>
</static-connectors>
</cluster-connection>
</cluster-connections>
<address-settings>
<address-setting match="activemq.management#">
<dead-letter-address>DLQ</dead-letter-address>
<expiry-address>ExpiryQueue</expiry-address>
<redelivery-delay>0</redelivery-delay>
<max-size-bytes>-1</max-size-bytes>
<message-counter-history-day-limit>10</message-counter-history-day-limit>
<address-full-policy>PAGE</address-full-policy>
<auto-create-queues>true</auto-create-queues>
<auto-create-addresses>true</auto-create-addresses>
<auto-create-jms-queues>true</auto-create-jms-queues>
<auto-create-jms-topics>true</auto-create-jms-topics>
</address-setting>
<address-setting match="#">
<dead-letter-address>DLQ</dead-letter-address>
<expiry-address>ExpiryQueue</expiry-address>
<redistribution-delay>60000</redistribution-delay>
<redelivery-delay>0</redelivery-delay>
<max-size-bytes>-1</max-size-bytes>
<message-counter-history-day-limit>10</message-counter-history-day-limit>
<address-full-policy>PAGE</address-full-policy>
<auto-create-queues>true</auto-create-queues>
<auto-create-addresses>true</auto-create-addresses>
<auto-create-jms-queues>true</auto-create-jms-queues>
<auto-create-jms-topics>true</auto-create-jms-topics>
</address-setting>
</address-settings>
<addresses>
<address name="DLQ">
<anycast>
<queue name="DLQ"/>
</anycast>
</address>
<address name="ExpiryQueue">
<anycast>
<queue name="ExpiryQueue"/>
</anycast>
</address>
</addresses>
</core>
<core xmlns="urn:activemq:core">
<jmx-management-enabled>true</jmx-management-enabled>
</core>
</configuration>
With this setup, I still got the same result, after the the artemis node crash and comeback, the left over message was not moved to the other node.
Update 2 I tried to use non-wildcard queue as Justin suggested but still got the same behavior. One different I noticed is that if I use the non-wildcard queue, the consumer count is only 1 compare to 3 in the case of wildcard queue.Here is the attributes of the old queue after the crash
Acknowledge attempts 0
Address QUEUE.service2.tech-drive2.188100000059.thai.notification
Configuration managed false
Consumer count 0
Consumers before dispatch 0
Dead letter address DLQ
Delay before dispatch -1
Delivering count 0
Delivering size 0
Durable true
Durable delivering count 0
Durable delivering size 0
Durable message count 15
Durable persistent size 102245
Durable scheduled count 0
Durable scheduled size 0
Enabled true
Exclusive false
Expiry address ExpiryQueue
Filter
First message age 840031
First message as json [{"JMSType":"service2","address":"QUEUE.service2.tech-drive2.188100000059.thai.notification","messageID":8739,"sentAt":1621969900922,"accessToken":"DONOTDISPLAY","type":3,"priority":4,"userID":"ID:09502dc0-bd8d-11eb-b75c-c6609f1332c9","_AMQ_GROUP_ID":"tech-drive2-188100000059-thai","sentBy":"[email protected]","durable":true,"JMSReplyTo":"queue://QUEUE.service2.tech-drive2.188100000059.thai.notification","__AMQ_CID":"c292b418-bd8b-11eb-b75c-c6609f1332c9","sentFrom":"service2","originalDestination":"QUEUE.service2.tech-drive2.188100000059.thai.notification","_AMQ_ROUTING_TYPE":1,"JMSCorrelationID":"90b783d0-d9cc-4188-9c9e-3453786b2105","expiration":0,"timestamp":1621969900922}]
First message timestamp 1621969900922
Group buckets -1
Group count 0
Group first key
Group rebalance false
Group rebalance pause dispatch false
Id 606
Last value false
Last value key
Max consumers -1
Message count 15
Messages acknowledged 0
Messages added 15
Messages expired 0
Messages killed 0
Name QUEUE.service2.tech-drive2.188100000059.thai.notification
Object Name org.apache.activemq.artemis:broker="activemq-artemis-master-0",component=addresses,address="QUEUE.service2.tech-drive2.188100000059.thai.notification",subcomponent=queues,routing-type="anycast",queue="QUEUE.service2.tech-drive2.188100000059.thai.notification"
Paused false
Persistent size 102245
Prepared transaction message count 0
Purge on no consumers false
Retroactive resource false
Ring size -1
Routing type ANYCAST
Scheduled count 0
Scheduled size 0
Temporary false
User 6e25e08b-9587-40a3-b7e9-146360539258
and here is the attributes of the new queue
Attribute Value
Acknowledge attempts 0
Address QUEUE.service2.tech-drive2.188100000059.thai.notification
Configuration managed false
Consumer count 1
Consumers before dispatch 0
Dead letter address DLQ
Delay before dispatch -1
Delivering count 0
Delivering size 0
Durable true
Durable delivering count 0
Durable delivering size 0
Durable message count 0
Durable persistent size 0
Durable scheduled count 0
Durable scheduled size 0
Enabled true
Exclusive false
Expiry address ExpiryQueue
Filter
First message age
First message as json [{}]
First message timestamp
Group buckets -1
Group count 0
Group first key
Group rebalance false
Group rebalance pause dispatch false
Id 866
Last value false
Last value key
Max consumers -1
Message count 0
Messages acknowledged 0
Messages added 0
Messages expired 0
Messages killed 0
Name QUEUE.service2.tech-drive2.188100000059.thai.notification
Object Name org.apache.activemq.artemis:broker="activemq-artemis-master-1",component=addresses,address="QUEUE.service2.tech-drive2.188100000059.thai.notification",subcomponent=queues,routing-type="anycast",queue="QUEUE.service2.tech-drive2.188100000059.thai.notification"
Paused false
Persistent size 0
Prepared transaction message count 0
Purge on no consumers false
Retroactive resource false
Ring size -1
Routing type ANYCAST
Scheduled count 0
Scheduled size 0
Temporary false
User 6e25e08b-9587-40a3-b7e9-146360539258
I've taken your simplified configuration with just 2 nodes using a non-wildcard queue with
redistribution-delay
of0
, and I reproduced the behavior you're seeing on my local machine (i.e. without Kubernetes). I believe I see why the behavior is such, but in order to understand the current behavior you first must understand how redistribution works in the first place.In a cluster every time a consumer is created the node on which the consumer is created notifies every other node in the cluster about the consumer. If other nodes in the cluster have messages in their corresponding queue but don't have any consumers then those other nodes redistribute their messages to the node with the consumer (assuming the
message-load-balancing
isON_DEMAND
and theredistribution-delay
is >=0
).In your case however, the node with the messages is actually down when the consumer is created on the other node so it never actually receives the notification about the consumer. Therefore, once that node restarts it doesn't know about the other consumer and does not redistribute its messages.
I see you've opened ARTEMIS-3321 to enhance the broker to deal with this situation. However, that will take time to develop and release (assuming the change is approved). My recommendation to you in the mean-time would be to configure your client reconnection which is discussed in the documentation, e.g.:
Given the default
retryInterval
of2000
milliseconds that will give the broker to which the client was originally connected 1 minute to come back up before the client gives up trying to reconnect and throws an exception at which point the application can completely re-initialize its connection as it is currently doing now.Since you're using Spring Boot be sure to use version 2.5.0 as it contains this change which will allow you to specify the broker URL rather than just host and port.
Lastly, keep in mind that shutting the node down gracefully will short-circuit the client's reconnect and trigger your application to re-initialize the connection, which is not what we want here. Be sure to kill the node ungracefully (e.g. using
kill -9 <pid>
).