Storm [ERROR] Async loop died

4.2k views Asked by At

I am using storm 0.9.3. I am running a topology in python, in which spouts read a URL off the Kafka queue and then pass it to a next bolt which fetches that page using python requests module.

Here is my topology definition in java.

public static void main(String[] args) throws Exception {

TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("URLSpout", new URLSpout(), 1);
builder.setBolt("ScrapeBolt", new ScrapeBolt(), 30).shuffleGrouping("URLSpout");


Config conf = new Config();
//conf.setDebug(true);
conf.setMessageTimeoutSecs(50);
conf.setNumWorkers(3);
conf.setMaxSpoutPending(50);
conf.put(Config.TOPOLOGY_RECEIVER_BUFFER_SIZE,             8);
conf.put(Config.TOPOLOGY_TRANSFER_BUFFER_SIZE,            32);
conf.put(Config.TOPOLOGY_EXECUTOR_RECEIVE_BUFFER_SIZE, 16384);
conf.put(Config.TOPOLOGY_EXECUTOR_SEND_BUFFER_SIZE,    16384);
StormSubmitter.submitTopologyWithProgressBar(args[0], conf, builder.createTopology());

}

After the topology is submitted, it runs fine, but after some time the topology halts. When checked the logs, I found these errors.

2015-06-09T17:34:49.733+0530 b.s.util [ERROR] Async loop died!
java.lang.RuntimeException: pid:5736, name:URLSpout exitCode:-1, errorString: 
    at backtype.storm.spout.ShellSpout.querySubprocess(ShellSpout.java:178) ~[storm-core-0.9.3.jar:0.9.3]
    at backtype.storm.spout.ShellSpout.nextTuple(ShellSpout.java:91) ~[storm-core-0.9.3.jar:0.9.3]
    at backtype.storm.daemon.executor$fn__3373$fn__3388$fn__3417.invoke(executor.clj:565) ~[storm-core-0.9.3.jar:0.9.3]
    at backtype.storm.util$async_loop$fn__464.invoke(util.clj:463) ~[storm-core-0.9.3.jar:0.9.3]
    at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
    at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45-internal]
Caused by: java.lang.RuntimeException: Unknown command received: error
    at backtype.storm.spout.ShellSpout.querySubprocess(ShellSpout.java:173) ~[storm-core-0.9.3.jar:0.9.3]
    ... 5 common frames omitted
2015-06-09T17:34:49.734+0530 b.s.d.executor [ERROR] 
java.lang.RuntimeException: pid:5736, name:URLSpout exitCode:-1, errorString: 
    at backtype.storm.spout.ShellSpout.querySubprocess(ShellSpout.java:178) ~[storm-core-0.9.3.jar:0.9.3]
    at backtype.storm.spout.ShellSpout.nextTuple(ShellSpout.java:91) ~[storm-core-0.9.3.jar:0.9.3]
    at backtype.storm.daemon.executor$fn__3373$fn__3388$fn__3417.invoke(executor.clj:565) ~[storm-core-0.9.3.jar:0.9.3]
    at backtype.storm.util$async_loop$fn__464.invoke(util.clj:463) ~[storm-core-0.9.3.jar:0.9.3]
    at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
    at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45-internal]
Caused by: java.lang.RuntimeException: Unknown command received: error
    at backtype.storm.spout.ShellSpout.querySubprocess(ShellSpout.java:173) ~[storm-core-0.9.3.jar:0.9.3]
    ... 5 common frames omitted
2015-06-09T17:34:49.745+0530 b.s.util [ERROR] Halting process: ("Worker died")
java.lang.RuntimeException: ("Worker died")
    at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325) [storm-core-0.9.3.jar:0.9.3]
    at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.5.1.jar:na]
    at backtype.storm.daemon.worker$fn__3812$fn__3813.invoke(worker.clj:456) [storm-core-0.9.3.jar:0.9.3]
    at backtype.storm.daemon.executor$mk_executor_data$fn__3274$fn__3275.invoke(executor.clj:240) [storm-core-0.9.3.jar:0.9.3]
    at backtype.storm.util$async_loop$fn__464.invoke(util.clj:473) [storm-core-0.9.3.jar:0.9.3]
    at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
    at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45-internal]

And

java.lang.RuntimeException: backtype.storm.multilang.NoOutputException: Pipe to subprocess seems to be broken! No output read.
Serializer Exception:
(Unable to capture error stream)

    at backtype.storm.utils.ShellProcess.readShellMsg(ShellProcess.java:101) ~[storm-core-0.9.3.jar:0.9.3]
    at backtype.storm.task.ShellBolt$BoltReaderRunnable.run(ShellBolt.java:318) ~[storm-core-0.9.3.jar:0.9.3]
    at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45-internal]
2015-06-09T17:34:49.779+0530 o.a.s.z.s.NIOServerCnxnFactory [ERROR] Thread Thread[Thread-37,5,main] died
java.lang.RuntimeException: java.lang.InterruptedException
    at backtype.storm.util$wrap_in_runtime.invoke(util.clj:44) ~[storm-core-0.9.3.jar:0.9.3]
    at backtype.storm.zookeeper$exists_node_QMARK_$fn__1668.invoke(zookeeper.clj:102) ~[storm-core-0.9.3.jar:0.9.3]
    at backtype.storm.zookeeper$exists_node_QMARK_.invoke(zookeeper.clj:98) ~[storm-core-0.9.3.jar:0.9.3]
    at backtype.storm.zookeeper$mkdirs.invoke(zookeeper.clj:114) ~[storm-core-0.9.3.jar:0.9.3]
    at backtype.storm.cluster$mk_distributed_cluster_state$reify__1915.mkdirs(cluster.clj:119) ~[storm-core-0.9.3.jar:0.9.3]
    at backtype.storm.cluster$mk_storm_cluster_state$reify__2372.report_error(cluster.clj:397) ~[storm-core-0.9.3.jar:0.9.3]
    at backtype.storm.daemon.executor$throttled_report_error_fn$fn__3221.invoke(executor.clj:180) ~[storm-core-0.9.3.jar:0.9.3]
    at backtype.storm.daemon.executor$fn__3441$fn$reify__3486.reportError(executor.clj:738) ~[storm-core-0.9.3.jar:0.9.3]
    at backtype.storm.task.OutputCollector.reportError(OutputCollector.java:223) ~[storm-core-0.9.3.jar:0.9.3]
    at backtype.storm.task.ShellBolt.die(ShellBolt.java:283) ~[storm-core-0.9.3.jar:0.9.3]
    at backtype.storm.task.ShellBolt.access$400(ShellBolt.java:69) ~[storm-core-0.9.3.jar:0.9.3]
    at backtype.storm.task.ShellBolt$BoltReaderRunnable.run(ShellBolt.java:341) ~[storm-core-0.9.3.jar:0.9.3]
    at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_45-internal]
Caused by: java.lang.InterruptedException: null
    at java.lang.Object.wait(Native Method) ~[na:1.8.0_45-internal]
    at java.lang.Object.wait(Object.java:502) ~[na:1.8.0_45-internal]
    at org.apache.storm.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1342) ~[storm-core-0.9.3.jar:0.9.3]
    at org.apache.storm.zookeeper.ZooKeeper.exists(ZooKeeper.java:1040) ~[storm-core-0.9.3.jar:0.9.3]
    at org.apache.storm.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:172) ~[storm-core-0.9.3.jar:0.9.3]
    at org.apache.storm.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:161) ~[storm-core-0.9.3.jar:0.9.3]
    at org.apache.storm.curator.RetryLoop.callWithRetry(RetryLoop.java:107) ~[storm-core-0.9.3.jar:0.9.3]
    at org.apache.storm.curator.framework.imps.ExistsBuilderImpl.pathInForeground(ExistsBuilderImpl.java:157) ~[storm-core-0.9.3.jar:0.9.3]
    at org.apache.storm.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:148) ~[storm-core-0.9.3.jar:0.9.3]
    at org.apache.storm.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:36) ~[storm-core-0.9.3.jar:0.9.3]
    at backtype.storm.zookeeper$exists_node_QMARK_$fn__1668.invoke(zookeeper.clj:101) ~[storm-core-0.9.3.jar:0.9.3]
    ... 11 common frames omitted
2015-06-09T17:34:49.780+0530 b.s.t.ShellBolt [ERROR] Halting process: ShellBolt died.

And

       2015-06-09T18:13:15.392+0530 b.s.s.ShellSpout [ERROR] Halting process: ShellSpout died.
    java.lang.RuntimeException: subprocess heartbeat timeout
        at backtype.storm.spout.ShellSpout$SpoutHeartbeatTimerTask.run(ShellSpout.java:255) [storm-core-0.9.3.jar:0.9.3]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_45-internal]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_45-internal]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_45-internal]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_45-internal]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_45-internal]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_45-internal]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45-internal]

Can someone,Please help me resolving this?

1

There are 1 answers

8
Jungtaek Lim On

You're hitting https://issues.apache.org/jira/browse/STORM-796 which is fixed via Storm 0.9.5. Since Storm 0.9.4 and 0.9.5 contains only bugfixes, so you can upgrade your Storm version from 0.9.3 to 0.9.5 safely.