Jupyter Hub not fiinding my workspace and killing it (version 4.1.13)

26 views Asked by At

I'm using jupyter hub inside kubernetes, my hub, and workspace are separate docker images. There is also a separate container running the configurable-http-proxy. Currently when I launch the workspace from the hub page, it receives the ping activity from the workspace pod, but it never finds the pod.

Below the logs and things I did. One thing I also noticed besides what's below, is that the routes do not contain anything for my user/workspace:

 curl -v -H "Authorization: token $TOKENHERE" http://XX.XX.XX.XX:XXXXX/api/routes

{"/":{"hub":true,"target":"http://jupyter-lab-service:8081","jupyterhub":true,"last_activity":"2024-03-28T22:12:23.899Z"}}

This is the only thing I can find in the logs:

timestamp="2024-03-28T16:00:50,184-0700",name="JupyterHub",level="WARNING",message="'s server never showed up at http://XX.XX.XX.XX:8888/user// after 60 seconds. Giving up.

Common causes of this timeout, and debugging tips:
    1. The server didn't finish starting,
       or it crashed due to a configuration issue.
       Check the single-user server's logs for hints at what needs fixing.
    2. The server started, but is not accessible at the specified URL.
       This may be a configuration issue specific to your chosen Spawner.
       Check the single-user server logs and resource to make sure the URL
       is correct and accessible from the Hub.
    3. (unlikely) Everything is working, but the server took too long to respond.
       To fix: increase `Spawner.http_timeout` configuration
       to a number of seconds that is enough for servers to become responsive.

Now I know #1 and #2 aren't a problem because here are the logs from the user pod:

timestamp="2024-03-28 16:03:19,555",name="ServerApp",level="INFO",message="Serving notebooks from local directory: /workspace/user/<myusername>"
timestamp="2024-03-28 16:03:19,555",name="ServerApp",level="INFO",message="Jupyter Server 2.5.0 is running at:"
timestamp="2024-03-28 16:03:19,555",name="ServerApp",level="INFO",message="http://<podname>:8888/user/<myusername>/lab?token=..."
timestamp="2024-03-28 16:03:19,555",name="ServerApp",level="INFO",message="    http://127.0.0.1:8888/user/<myusername>/lab?token=..."
timestamp="2024-03-28 16:03:19,555",name="ServerApp",level="INFO",message="Use Control-C to stop this server and shut down all kernels (twice to skip confirmation)."
timestamp="2024-03-28 16:03:19,565",name="JupyterHubSingleUser",level="DEBUG",message="Notifying Hub of activity 2024-03-28T23:03:19.224418Z"

ServerApp starts and notifies hub of activity. And I can hit the URL the hub says the server should be:

http://XX.XX.XX.XX:8888/user/<myusernamehere>/ just a regular curl though to check it's reachable and something is taking requests from there.

here is my hub config:

## Enable debug-logging
c.JupyterHub.log_datefmt = '%Y-%m-%dT%H:%M:%S'
c.JupyterHub.log_format = 'timestamp="%(asctime)s,%(msecs).03s-0700",name="%(name)s",level="%(levelname)s",message="%(message)s"'
c.JupyterHub.log_level = 'DEBUG'
c.Spawner.debug = True


kube_namespace_pod_limit = 100
c.JupyterHub.active_server_limit = kube_namespace_pod_limit - 2

## Duration (in seconds) to determine the number of active users.
c.JupyterHub.active_user_window = 3600

##Auth stuff: omitting ###

JUPYTERHUB_PORT = os.environ.get("JUPYTERHUB_PORT", "18888")
c.JupyterHub.bind_url = os.environ.get("JUPYTERHUB_BIND_URL", f"http://0.0.0.0:{JUPYTERHUB_PORT}")


#  The Hub should be able to resume from database state.
c.JupyterHub.cleanup_proxy = False

#  The Hub should be able to resume from database state.
c.JupyterHub.cleanup_servers = False

#  If set to 0, no limit is enforced. even only one at a time is failing
c.JupyterHub.concurrent_spawn_limit = 20

c.JupyterHub.cookie_max_age_days = 14

## pingback handler stuff: omitting here ###


import coco_jupyter_extensions.nbsharing
c.JupyterHub.extra_handlers = coco_jupyter_extensions.nbsharing.HANDLERS + [
    ("/usagepingback", UsagePingbackHandler),
] + [(f"/{k}(/.*)?", SimpleSparkRewritingProxy) for k in REWRITES.keys()]


_hostname = os.environ.get("JUPYTERHUB_SERVICE_NAME", socket.gethostbyname(socket.gethostname()))
#c.JupyterHub.hub_connect_url = f"http://{_hostname}:18888/"
c.JupyterHub.hub_connect_ip = _hostname



c.JupyterHub.hub_ip = '0.0.0.0'

c.JupyterHub.init_spawners_timeout = -1

c.Proxy.should_start = False  # proxy is served from a diff container
c.JupyterHub.proxy_class = 'jupyterhub.proxy.ConfigurableHTTPProxy'

with open(os.environ.get(
    "CONFIGPROXY_AUTH_TOKEN_FILE",
    "/secrets/dsj/hub_configproxy_auth_token"
), "rt") as token_file:
    c.ConfigurableHTTPProxy.auth_token = token_file.read()

c.ConfigurableHTTPProxy.check_running_interval = 2  # seconds



found_proxy_api_url = None
for env_key in os.environ:
    # find e.g. DEV*_JUPYTERHUB_PROXY_SERVICE_PORT_28001_TCP=tcp://172.31.123.69:28001
    if all([
        "JUPYTERHUB_PROXY_SERVICE" in env_key,
        env_key.endswith("28001_TCP"),
        os.environ[env_key].startswith("tcp://")
    ]):
        # https://jupyterhub.readthedocs.io/en/stable/reference/api/proxy.html#configurablehttpproxy
        found_proxy_api_url = os.environ[env_key].replace("tcp://", "http://")
        c.ConfigurableHTTPProxy.api_url = found_proxy_api_url
        break
else:
    raise ValueError(f"Cannot auto discover ConfigurableHTTPProxy.api_url from container env.")


c.JupyterHub.redirect_to_server = False

import sys
workspace_timeout_seconds = os.environ.get("DSJ_WORKSPACE_TIMEOUT_SECONDS", 60*60*48)
cull_every_seconds = 60 * 15 # check every 15 minutes
c.JupyterHub.services = [
    {
        'name': 'idle-culler',
        'admin': True,
        'command': [
            sys.executable,
            '-m', 'jupyterhub_idle_culler',
            '--url=http://localhost:8081/hub/api',
            f'--timeout={workspace_timeout_seconds}',
            f'--cull-every={cull_every_seconds}',
        ],
    }
]

c.JupyterHub.shutdown_on_logout = False

c.Spawner.default_url = '/lab'

c.Spawner.http_timeout = 60

c.Spawner.start_timeout = 60 * 10

### User and roles stuff: omitting ###
c.Authenticator.enable_auth_state = True
0

There are 0 answers