I am evaluating the Galera Cluster and I can't explain myself the testing results.
When comparing a single-node Galera
with a standalone MariaDB 10.1.20
, I have noticed a suspiciously big performance difference with durable/non-durable settings:
Galera
is 3x slower thanstandalone
, both with durable settings- Durable
Galera
is 3x slower thannon-durable Galera
Config:
[mysqld]
# durable
sync_binlog=1
innodb_flush_log_at_trx_commit=1
# non-durable
# sync_binlog=0
# innodb_flush_log_at_trx_commit=2
max_connections=2000
query_cache_type=0
query_cache_size=0
log_bin=1
binlog_format=ROW
log_slave_updates=1
innodb_flush_method=O_DIRECT
innodb_buffer_pool_size=4000M
innodb_buffer_pool_instances=4
innodb_log_buffer_size=64M
[galera]
wsrep_on=ON
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
innodb-autoinc-lock-mode=2
wsrep_cluster_name=galera
wsrep_node_address=node1
wsrep_node_name=node1
wsrep_cluster_address=gcomm://
wsrep_sst_method=rsync
wsrep_slave_threads=8
Benchmark: Sysbench 0.5
sysbench \
--test=/usr/share/doc/sysbench/tests/db/oltp.lua \
--mysql-host=localhost \
--mysql-user=root \
--oltp-table-size=1000000 \
--num-threads=128 \
--max-requests=0 \
--max-time=60 run
Results:
Galera, durable
read/write requests: 4994.74 per sec.
Standalone, durable
read/write requests: 16858.99 per sec.
Galera, non-durable
read/write requests: 15938.04 per sec.
Standalone, non-durable
read/write requests: 17055.88 per sec.
Server details:
2 Cores
8 GB RAM
CentOS 7
SSD
I have repeated the tests multiple times, even re-bootstrapped the data directory and Galera.
Some observations:
CPU idle
(yes, idle) >50% with the durable Galera, <1% in other testing scenariosiowait
>20% with the durable Galera, <1% in other testing scenarios
innodb_flush_log_at_trx_commit=1
slows down InnoDB considerably (YMMV). This is independent of Galera.=1
incurs an extra write on everyCOMMIT
, explicit or implicit. This is how it achieves durability even in the face of abrupt power failure.=1
is not necessary for Galera -- If a node crashes, rebuild it.Do not trust a benchmark to judge how your application will run.
Other issues -- Did you have a load balancer sharing the writes among the various nodes? How many nodes? RAID? SSD? Latency between nodes?
And, more importantly, will you ever come close to 16K writes/sec? If not, then the benchmark provides virtually no clue of how well your application will run.