All action are performed in debian 7 virtual machines. Two nodes have installed: galera replicator, mysql galera from codership, percona-xtrabackup, netcat-openbsd (requried by percona-xtrabackup). The third node has only galera replicator and acts as arbitrator with garbd running.
Config on node #1 (192.168.0.102)
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_provider_options="gcache.size=2G"
wsrep_cluster_name="clusterTest"
wsrep_cluster_address="gcomm://"
wsrep_node_name="node-1"
wsrep_node_address=192.168.0.102
wsrep_node_incoming_address=192.168.0.102
wsrep_slave_threads=16
wsrep_sst_method=xtrabackup
wsrep_sst_receive_address=192.168.0.102
wsrep_sst_auth=root:somepass
Config on node #2 (192.168.0.103)
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_provider_options="gcache.size=2G"
wsrep_cluster_name="clusterTest"
wsrep_cluster_address="gcomm://192.168.0.102"
wsrep_node_name="node-2"
wsrep_node_address=192.168.0.103
wsrep_node_incoming_address=192.168.0.103
wsrep_slave_threads=16
wsrep_sst_method=xtrabackup
wsrep_sst_receive_address=192.168.0.103
wsrep_sst_auth=root:somepass
wsrep_sst_donor="node-1"
At first run only node-1 has a database for testing, let's call it testDB.
What I do:
1. node-1> service mysql start
Result: node is working, testDB is accessible from any host and the node itself.
2. node-3> garbd --address gcomm://192.168.0.102,192.168.0.103 --group "clusterTest"
Resutl: the cluster size is 2.
3. node-2> service mysql start
Result: the cluster size is 3, but the init-script reports that service start failed, however the processes are running, the sst is performed.
Also I can't access mysql running on node-2:
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (111)
And from remote host:
PHP Warning: mysqli::mysqli(): (HY000/2003): Can't connect to MySQL server on '192.168.0.103' (111)
Cluster state from node-1:
wsrep_local_state_comment | Donor/Desynced
wsrep_incoming_addresses | 192.168.0.102:3306,,192.168.0.103:3306
wsrep_cluster_conf_id | 3
wsrep_cluster_size | 3
If I start the mysql on node-2 with wsrep_provider set to "none", the database is fully accessible from local and remote host and is equal to the database on node-1. If I start the cluster again, the situation repeats, node-2 is only visible by other nodes, cluster becomes desynced and node-2 is not accessible neither from console, nor from remote hosts.
Your most helpful tool when troubleshooting Galera issues will be the MySQL error logs. In Debian, they are located in /var/log/syslog by default.
It appears you're using Node 1 to bootstrap your cluster. It's critical to get your wsrep_cluster_address settings correct. The settings for both nodes should be as follows: