I've set up a kubernetes cluster with GKE and installed the dask-kubernetes-operator. When i try to start the cluster like this
cluster: KubeCluster = KubeCluster(custom_cluster_spec="cluster.yaml")
client = Client(cluster)
client
where the .yaml is basically the cluster-spec.yaml example from this website but with my own image (based on ghcr.io/dask/dask:2023.10.0-py3.10), i get the following error message, often multiple times in a row:
Task exception was never retrieved
future: <Task finished name='Task-822' coro=<PortForward._sync_sockets() done, defined at
/opt/conda/lib/python3.10/site-packages/kr8s/_portforward.py:167> exception=ExceptionGroup('unhandled errors in a
TaskGroup', [ConnectionClosedError('TCP socket closed')])>
+ Exception Group Traceback (most recent call last):
| File "/opt/conda/lib/python3.10/site-packages/kr8s/_portforward.py", line 171, in _sync_sockets
| async with anyio.create_task_group() as tg:
| File "/opt/conda/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 664, in __aexit__
| raise BaseExceptionGroup(
| exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/opt/conda/lib/python3.10/site-packages/kr8s/_portforward.py", line 183, in _tcp_to_ws
| raise ConnectionClosedError("TCP socket closed")
| kr8s._exceptions.ConnectionClosedError: TCP socket closed
+------------------------------------
In the scheduler pod logs it also says
distributed.comm.tcp - INFO - Connection from tcp://127.0.0.1:51484 closed before handshake completed
+ '[' '' ']'
+ '[' '' == true ']'
+ CONDA_BIN=/opt/conda/bin/conda
+ '[' -e /opt/app/environment.yml ']'
+ echo 'no environment.yml'
+ '[' '' ']'
+ '[' '' ']'
+ exec dask-scheduler
no environment.yml
/opt/conda/lib/python3.10/site-packages/distributed/cli/dask_scheduler.py:142: FutureWarning: dask-scheduler is deprecated and will be removed in a future release; use `dask scheduler` instead
warnings.warn(
2023-11-23 12:35:29,548 - distributed.scheduler - INFO - -----------------------------------------------
2023-11-23 12:35:30,422 - distributed.http.proxy - INFO - To route to workers diagnostics web server please install jupyter-server-proxy: python -m pip install jupyter-server-proxy
2023-11-23 12:35:30,473 - distributed.scheduler - INFO - State start
2023-11-23 12:35:30,478 - distributed.scheduler - INFO - -----------------------------------------------
2023-11-23 12:35:30,479 - distributed.scheduler - INFO - Scheduler at: tcp://10.12.0.35:8786
2023-11-23 12:35:30,480 - distributed.scheduler - INFO - dashboard at: http://10.12.0.35:8787/status
2023-11-23 12:35:30,480 - distributed.scheduler - INFO - Registering Worker plugin shuffle
2023-11-23 12:41:27,053 - distributed.comm.tcp - INFO - Connection from tcp://127.0.0.1:51484 closed before handshake completed
2023-11-23 12:41:54,805 - distributed.scheduler - INFO - Receive client connection: Client-adc9e3c2-89fd-11ee-8284-0242ac120003
2023-11-23 12:41:54,806 - distributed.core - INFO - Starting established connection to tcp://127.0.0.1:38964
(base) root@1675432b9888:/workspaces/trading_bot# kubectl delete daskclusters example
I've tried increasing InitialDelaySeconds and checked that versions match, but that didn't help. Couldn't find much else about this error online.
This is just a noisy warning and shouldn't stop your code from working.
The bug that caused the warning has now been fixed so upgrading to new versions should resolve this.