Neo4j v2.2.3 Embedded HA: can't create cluster

471 views Asked by At

I have an embedded Neo4j v2.2.3 three identical server setup, where I'm trying to turn a single database into a HA setup. I've tried beginning the HA process with every combination of databases: all empty, all-but-one empty and all using the same database, but to no avail. AFAIK for some reason the Neo4j instances can't connect to each other. I have verified that the IP addresses are correct, and the port 5001 should be open. I've also opened 6001 for.

Here is my messages.log.

2015-06-25 20:37:16.461+0000 INFO  [o.n.k.i.DiagnosticsManager]: --- INITIALIZED diagnostics START ---
2015-06-25 20:37:16.462+0000 INFO  [o.n.k.i.DiagnosticsManager]: Neo4j Kernel properties:
2015-06-25 20:37:16.467+0000 INFO  [o.n.k.i.DiagnosticsManager]: ha.server_id=1
2015-06-25 20:37:16.467+0000 INFO  [o.n.k.i.DiagnosticsManager]: ha.server=:6001
2015-06-25 20:37:16.467+0000 INFO  [o.n.k.i.DiagnosticsManager]: online_backup_server=0.0.0.0:6362
2015-06-25 20:37:16.467+0000 INFO  [o.n.k.i.DiagnosticsManager]: ephemeral=false
2015-06-25 20:37:16.467+0000 INFO  [o.n.k.i.DiagnosticsManager]: ha.initial_hosts=[IP1]:5001,[IP2]:5001,[IP3]:5001
2015-06-25 20:37:16.467+0000 INFO  [o.n.k.i.DiagnosticsManager]: online_backup_enabled=true
2015-06-25 20:37:16.468+0000 INFO  [o.n.k.i.DiagnosticsManager]: ha.cluster_server=:5001
2015-06-25 20:37:16.468+0000 INFO  [o.n.k.i.DiagnosticsManager]: store_dir=/var/neo4j
2015-06-25 20:37:16.468+0000 INFO  [o.n.k.i.DiagnosticsManager]: org.neo4j.server.webserver.address=0.0.0.0
2015-06-25 20:37:16.468+0000 INFO  [o.n.k.i.DiagnosticsManager]: org.neo4j.server.database.mode=HA
2015-06-25 20:37:16.469+0000 INFO  [o.n.k.i.DiagnosticsManager]: Diagnostics providers:
2015-06-25 20:37:16.469+0000 INFO  [o.n.k.i.DiagnosticsManager]: org.neo4j.kernel.configuration.Config
2015-06-25 20:37:16.469+0000 INFO  [o.n.k.i.DiagnosticsManager]: org.neo4j.kernel.info.DiagnosticsManager
2015-06-25 20:37:16.469+0000 INFO  [o.n.k.i.DiagnosticsManager]: SYSTEM_MEMORY
2015-06-25 20:37:16.469+0000 INFO  [o.n.k.i.DiagnosticsManager]: JAVA_MEMORY
2015-06-25 20:37:16.469+0000 INFO  [o.n.k.i.DiagnosticsManager]: OPERATING_SYSTEM
2015-06-25 20:37:16.469+0000 INFO  [o.n.k.i.DiagnosticsManager]: JAVA_VIRTUAL_MACHINE
2015-06-25 20:37:16.469+0000 INFO  [o.n.k.i.DiagnosticsManager]: CLASSPATH
2015-06-25 20:37:16.469+0000 INFO  [o.n.k.i.DiagnosticsManager]: LIBRARY_PATH
2015-06-25 20:37:16.469+0000 INFO  [o.n.k.i.DiagnosticsManager]: SYSTEM_PROPERTIES
2015-06-25 20:37:16.469+0000 INFO  [o.n.k.i.DiagnosticsManager]: LINUX_SCHEDULERS
2015-06-25 20:37:16.469+0000 INFO  [o.n.k.i.DiagnosticsManager]: NETWORK
2015-06-25 20:37:16.469+0000 INFO  [o.n.k.i.DiagnosticsManager]: NodeCache
2015-06-25 20:37:16.469+0000 INFO  [o.n.k.i.DiagnosticsManager]: RelationshipCache
2015-06-25 20:37:16.469+0000 INFO  [o.n.k.i.DiagnosticsManager]: HighAvailabilityDiagnostics

....

2015-06-25 21:55:36.502+0000 INFO  [o.n.k.i.DiagnosticsManager]: High Availability diagnostics
Member state:PENDING
State machines:
   AtomicBroadcastMessage:start
   AcceptorMessage:start
   ProposerMessage:start
   LearnerMessage:start
   HeartbeatMessage:start
   ElectionMessage:start
   SnapshotMessage:start
   ClusterMessage:start
Current timeouts:

Eventually after two minutes I get a transaction exception:

Caused by: org.neo4j.graphdb.TransactionFailureException: Timeout waiting for database to become available and allow new transactions. Waited 2m. 2 reasons for blocking: Database is stopped, Cluster state is 'PENDING'.

I create a graphDatabaseFactory = new HighlyAvailableGraphDatabaseFactory() which is used to create

DatabaseServiceImpl(
    graphDatabaseFactory
      .newEmbeddedDatabaseBuilder(neo4jStoreDir)
      .loadPropertiesFromFile(configFileLocation)
      .newGraphDatabase())

This is what my neo4j.properties looks like:

online_backup_enabled=true
online_backup_server=0.0.0.0:6362
org.neo4j.server.webserver.address=0.0.0.0
org.neo4j.server.database.mode=HA
ha.server_id=1
ha.cluster_server=0.0.0.0:5001
ha.server=0.0.0.0:6001
ha.initial_hosts=[IP1]:5001,[IP2]:5001,[IP3]:5001

I've tried a lot of different combinations for the properties and also added the suggested values from neo4-server.properties but nothing helps. Where should I put neo4j-server.properties in embedded mode, or are they not needed (that's my initial guess)?

What might be wrong? Is it even possible to setup a HA cluster using embedded Neo4j anymore?

EDIT. I made sure every server is on the same subnet and the servers can connect to each other without obstructions.

2

There are 2 answers

0
ttiurani On BEST ANSWER

So the problem turned out to be that I'm using kernel extensions set with new HighlyAvailableGraphDatabaseFactory().addKernelExtensions(myKernelExtensionsArray). The addKernelExtensions method is deprecated, but these extensions work great with a single server setup. However on this HA server setup they fail for some reason.

I was able to reuse my kernel extensions by replacing a call to addKernelExtensions with registerTransacionEventHandler.

7
Stefan Armbruster On

Neo4j definitely supports clustering in embedded mode - you can even mix server and embedded instances in the same cluster.

You don't need the settings from neo4j-server.properties at all when running in embedded mode.

Some things to check:

  1. make sure the 3 cluster members are on the same subnet. If they aren't and they're physically dislocated, consider establishing a VPN (e.g. via openvpn) to have them on the same subnet.
  2. Allow any IP traffic in between the cluster members, they will open up additional ports.