I recently upgraded from Datastax 4.6.3 => 4.7, and now I am having trouble running Spark. The problem seems to be that the Spark Master is not configured properly. I use OpsCenter 5.1.3, and started a three node Analytics cluster. Strangely, the nodes initially has the setting SPARK_ENABLED=0, and I had to set it to 1 manually. Now, however, spark master is not configured properly. In /var/log/cassandra/system.log, I get a long output of:
[SPARK-WORKER-INIT-0] 2015-06-13 21:59:54,027 SparkWorkerRunner.java:49 - Spark Master not ready at (no configured master)
INFO [SPARK-WORKER-INIT-0] 2015-06-13 21:59:55,028 SparkWorkerRunner.java:49 - Spark Master not ready at (no configured master)
INFO [SPARK-WORKER-INIT-0] 2015-06-13 21:59:56,028 SparkWorkerRunner.java:49 - Spark Master not ready at (no configured master)
I try to run dse spark, and I get the following error:
java.io.IOException: Spark Master address cannot be retrieved. This really should not be happening with DSE 4.7+ unless your cluster is over 50% down or booted up in the last minute.
at com.datastax.bdp.plugin.SparkPlugin.getMasterAddress(SparkPlugin.java:257)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75)
at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279)
at com.sun.jmx.mbeanserver.StandardMBe
My Analytics DC has been up for a few days, and there are no nodes booting. This issue has been blocking development for the last few days, and I am considering downgrading back to DSE 4.6.3, just so I can run my spark jobs again. Any help whatsoever is appreciated.
UPDATE:
I am looking into the condition that 50% of analytics nodes are required to be up for spark master to start. After examining system.log on dse startup, I'm noticing that Gossip still seems to think some old nodes are part of the cluster, and DOWN. For instance,
INFO [GossipStage:1] 2015-06-14 03:18:05,587 Gossiper.java:968 - InetAddress /172.31.23.17 is now DOWN
INFO [GossipStage:1] 2015-06-14 03:18:05,614 Gossiper.java:968 - InetAddress /172.31.16.58 is now DOWN
INFO [GossipStage:1] 2015-06-14 03:18:05,647 Gossiper.java:968 - InetAddress /172.31.24.25 is now DOWN
INFO [GossipStage:1] 2015-06-14 03:18:05,687 Gossiper.java:968 - InetAddress /172.31.24.147 is now DOWN
These are nodes that I took offline earlier. I have purged the system.peers table of these nodes, but Gossip still seems to acknowledge them as part of the cluster. The phantom presence of these nodes would push the cluster past 50% down. However purging the gossip tables requires a full cluster shutdown.