spark streaming - waiting for a dead executor

317 views Asked by At

I have a spark streaming application running inside a k8s cluster (using spark-operator).

I have 1 executor, reading batches every 5s from a Kinesis stream. The Kinesis stream has 12 shards, and the executor creates 1 receiver per shard. I gave it 24 cores, so it should be more than enough to handle it.

For some unknown reason, sometimes the executor crashes. I suspect it is due to memory going over the k8s pod memory limit, which would cause k8s to kill the pod. But I have not been able to prove this theory yet.

enter image description here

After the executor crashes, a new one is created.

However, the "work" stops. The new executor is not doing anything.

I investigated a bit:

  • Looking at the logs of the pod - I saw that it did execute a few tasks successfully after it was created, and then it stopped because it did not get any more tasks.

  • Looking in Spark Web UI - I see that there is 1 “running batch” that is not finishing.

enter image description here enter image description here

I found some docs that say there can always be only 1 active batch at a time. So this is why the work stopped - it is waiting for this batch to finish.

Digging a bit deeper in the UI, I found this page that shows details about the tasks.

enter image description here

So executor 2 finished doing all the tasks it was assigned. There are 12 tasks that were assigned to executor 1 which are still waiting to finish, but executor 1 is dead.

Why does this happen? Why does Spark not know that executor 1 is dead and never going to finish it's assigned tasks? I would expect Spark to reassign these tasks to executor 2.

0

There are 0 answers