Infinispan Clustered Cache & JGroups - Servers don't see each other

1k views Asked by At

I'm using Infinispan to create a distributed cache between two servers and to leverage its failover feature.

I initially tested my webservice on two local instances of tomcat, using the pre-configured JGroups configuration file provided by infinispan-core-7.0.0.Final.jar. I was able to get the distributed cache working between the two Tomcat instances since the pre-configured xml files were using the loopback ip address.

I then moved the webservice onto two separate servers and have been unable to have them join the same Group. I created my own custom JGroups tcp configuration xml because using the loopback ip in the pre-configured one was causing some issues.

I don't have much experience in setting up tcp or udp channel, so I think the problem may lie with my JGroups configuration file (I based it off the pre-configured one).

<config xmlns="urn:org:jgroups"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/JGroups-3.4.xsd">
        <!-- bind_addr="${jgroups.tcp.address:127.0.0.1}"-->
   <TCP
        bind_addr="GLOBAL"
        bind_port="${jgroups.tcp.port:7800}"
        port_range="30"
        recv_buf_size="20m"
        send_buf_size="640k"
        max_bundle_size="31k"
        use_send_queues="true"
        enable_diagnostics="false"
        bundler_type="sender-sends-with-timer"

        thread_naming_pattern="pl"

        thread_pool.enabled="true"
        thread_pool.min_threads="2"
        thread_pool.max_threads="30"
        thread_pool.keep_alive_time="60000"
        thread_pool.queue_enabled="true"
        thread_pool.queue_max_size="100"
        thread_pool.rejection_policy="Discard"

        oob_thread_pool.enabled="true"
        oob_thread_pool.min_threads="2"
        oob_thread_pool.max_threads="30"
        oob_thread_pool.keep_alive_time="60000"
        oob_thread_pool.queue_enabled="false"
        oob_thread_pool.queue_max_size="100"
        oob_thread_pool.rejection_policy="Discard"

        internal_thread_pool.enabled="true"
        internal_thread_pool.min_threads="2"
        internal_thread_pool.max_threads="4"
        internal_thread_pool.keep_alive_time="60000"
        internal_thread_pool.queue_enabled="true"
        internal_thread_pool.queue_max_size="100"
        internal_thread_pool.rejection_policy="Discard"
        />

   <!-- Ergonomics, new in JGroups 2.11, are disabled by default in TCPPING until JGRP-1253 is resolved -->
   <!--
   <TCPPING timeout="3000"
            initial_hosts="localhost[7800],localhost[7801]"
            port_range="5"
            num_initial_members="3"
            ergonomics="false"
        />
   -->

   <!-- bind_addr="${jgroups.bind_addr:127.0.0.1}" -->
 <!--  ip_ttl="${jgroups.udp.ip_ttl:2}"-->
   <MPING bind_addr="GLOBAL" break_on_coord_rsp="true"
      mcast_addr="${jgroups.mping.mcast_addr:228.2.4.6}"
      mcast_port="${jgroups.mping.mcast_port:43366}"
      num_initial_members="3"/>
   <MERGE3/>

   <FD_SOCK/>
   <FD timeout="3000" max_tries="5"/>
   <VERIFY_SUSPECT timeout="1500"/>

   <pbcast.NAKACK2 use_mcast_xmit="false"
                   xmit_interval="1000"
                   xmit_table_num_rows="100"
                   xmit_table_msgs_per_row="10000"
                   xmit_table_max_compaction_time="10000"
                   max_msg_batch_size="100"/>
   <UNICAST3 xmit_interval="500"
             xmit_table_num_rows="20"
             xmit_table_msgs_per_row="10000"
             xmit_table_max_compaction_time="10000"
             max_msg_batch_size="100"
             conn_expiry_timeout="0"/>

   <pbcast.STABLE stability_delay="500" desired_avg_gossip="5000" max_bytes="1m"/>
   <pbcast.GMS print_local_addr="false" join_timeout="3000" view_bundling="true"/>
   <tom.TOA/> <!-- the TOA is only needed for total order transactions-->

   <MFC max_credits="2m" min_threshold="0.40"/>
   <FRAG2 frag_size="30k"/>
   <RSVP timeout="60000" resend_interval="500" ack_on_delivery="false" />
</config>

My initial thought is that the problem may be with the bind_addr in the TCP and MPing elements. The two servers are on the same network and are able to ping each other. Anyone have any tips/insights on the configuration file above?

If it helps I've posted what's in the log file in regards to the Infinispan/JGroups startup below:

SERVER 1:

INFO  JGroupsTransport - ISPN000078: Starting JGroups channel esrs
Nov 20, 2014 3:22:43 AM org.jgroups.logging.JDKLogImpl warn
WARNING: JGRP000014: Discovery.num_initial_members has been deprecated: will be ignored
INFO  JGroupsTransport - ISPN000094: Received new cluster view for channel esrs: [udmesrs02-61057|0] (1) [udmesrs02-61057]
INFO  JGroupsTransport - ISPN000079: Channel esrs local address is udmesrs02-61057
INFO  GlobalComponentRegistry - ISPN000128: Infinispan version: Infinispan 'Guinness' 7.0.0.Final

SERVER 2:

INFO  JGroupsTransport - ISPN000078: Starting JGroups channel esrs
Nov 20, 2014 3:20:28 AM org.jgroups.logging.JDKLogImpl warn
WARNING: JGRP000014: Discovery.num_initial_members has been deprecated: will be ignored
INFO  JGroupsTransport - ISPN000094: Received new cluster view for channel esrs: [udmesrs01-16389|0] (1) [udmesrs01-16389]
INFO  JGroupsTransport - ISPN000079: Channel esrs local address is udmesrs01-16389
INFO  GlobalComponentRegistry - ISPN000128: Infinispan version: Infinispan 'Guinness' 7.0.0.Final
1

There are 1 answers

0
Radim Vansa On

There are two possible issues: IPv4/IPv6 issues and UDP routing.

First try to set -Djava.net.preferIPv4Stack=true on both machines.

If that does not help, check your UDP firewall and routing settings.

If you don't find anything strange there, you'll have to use tcpdump on udp and port 43366 and tcp 7800 and see if there's any activity - there should be some multicast packet going from each node at least every 15 s.