I have a compose file with three services (database, backend and frontend). Backend depends on database being healthy, and frontend depends on backend being healthy.
Database (postgres) checks for its own health using pg_isready and backend (FastAPI) checks for its health via an endpoint http://localhost:8080/healthcheck
Compose file:
version: '3'
services:
database:
image: postgres:14-alpine
healthcheck:
test: pg_isready -U postgres
interval: 1s
timeout: 5s
retries: 5
start_period: 10s
backend:
depends_on:
database:
condition: service_healthy
image: backend-api-image
build:
context: backend
dockerfile: Dockerfile
ports:
- "8080:8080"
volumes:
- './backend:/backend'
healthcheck:
test: wget --no-verbose --tries=1 --spider http://localhost:8080/healthcheck || exit 1
interval: 1s
timeout: 5s
frontend:
image: my-frontend
depends_on:
backend:
condition: service_healthy
build:
context: ./frontend
dockerfile: Dockerfile
FastAPI app
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
app = FastAPI()
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
@app.get('/healthcheck')
def get_healthcheck():
return 'OK'
So far this all works as expected. If, for example I were to have a typo in my healthcheck endpoint route (in my app), startup would fail, like so:
database | 2023-06-01 23:01:44.410 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
database | 2023-06-01 23:01:44.410 UTC [1] LOG: listening on IPv6 address "::", port 5432
database | 2023-06-01 23:01:44.411 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
database | 2023-06-01 23:01:44.414 UTC [22] LOG: database system was shut down at 2023-06-01 22:51:10 UTC
database | 2023-06-01 23:01:44.417 UTC [1] LOG: database system is ready to accept connections
backend | INFO: Will watch for changes in these directories: ['/backend']
backend | INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
backend | INFO: Started reloader process [1] using StatReload
backend | INFO: Started server process [8]
backend | INFO: Waiting for application startup.
backend | INFO: Application startup complete.
backend | INFO: 127.0.0.1:41294 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend | INFO: 127.0.0.1:41296 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend | INFO: 127.0.0.1:41298 - "GET /healthcheck HTTP/1.1" 404 Not Found
dependency failed to start: container backend is unhealthy
Where I'm getting confused is, that after a successful startup, if I change the app in such a way to make backend become unhealthy, the container would detect the change and the check would return a 404 (as expected) but it would never become unhealthy.
database | 2023-06-01 23:06:37.396 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
database | 2023-06-01 23:06:37.396 UTC [1] LOG: listening on IPv6 address "::", port 5432
database | 2023-06-01 23:06:37.397 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
database | 2023-06-01 23:06:37.400 UTC [22] LOG: database system was shut down at 2023-06-01 23:06:34 UTC
database | 2023-06-01 23:06:37.403 UTC [1] LOG: database system is ready to accept connections
backend | INFO: Will watch for changes in these directories: ['/backend']
backend | INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
backend | INFO: Started reloader process [1] using StatReload
backend | INFO: Started server process [9]
backend | INFO: Waiting for application startup.
backend | INFO: Application startup complete.
backend | INFO: 127.0.0.1:49450 - "GET /healthcheck HTTP/1.1" 200 OK
frontend |
frontend | > [email protected] dev
frontend | > vite --host
frontend |
frontend | Forced re-optimization of dependencies
frontend |
frontend | VITE v4.3.1 ready in 285 ms
frontend |
frontend | ➜ Local: http://localhost:5173/
frontend | ➜ Network: http://172.26.0.4:5173/
backend | INFO: 127.0.0.1:57966 - "GET /healthcheck HTTP/1.1" 200 OK
backend | INFO: 127.0.0.1:57968 - "GET /healthcheck HTTP/1.1" 200 OK
backend | INFO: 127.0.0.1:57982 - "GET /healthcheck HTTP/1.1" 200 OK
backend | INFO: 127.0.0.1:57992 - "GET /healthcheck HTTP/1.1" 200 OK
backend | INFO: 127.0.0.1:58002 - "GET /healthcheck HTTP/1.1" 200 OK
backend | INFO: 127.0.0.1:58012 - "GET /healthcheck HTTP/1.1" 200 OK
backend | INFO: 127.0.0.1:58018 - "GET /healthcheck HTTP/1.1" 200 OK
backend | WARNING: StatReload detected changes in 'src/main.py'. Reloading...
backend | INFO: Shutting down
backend | INFO: Waiting for application shutdown.
backend | INFO: Application shutdown complete.
backend | INFO: Finished server process [9]
backend | INFO: Started server process [76]
backend | INFO: Waiting for application startup.
backend | INFO: Application startup complete.
backend | INFO: 127.0.0.1:58028 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend | INFO: 127.0.0.1:58040 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend | INFO: 127.0.0.1:35092 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend | INFO: 127.0.0.1:35098 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend | INFO: 127.0.0.1:35102 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend | INFO: 127.0.0.1:35116 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend | INFO: 127.0.0.1:35126 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend | INFO: 127.0.0.1:35134 - "GET /healthcheck HTTP/1.1" 404 Not Found
What I expected:
While running after a successful startup, upon changing the backend code in such a way that its healthcheck would fail, I expected frontend to exit or become degraded somehow, as its health dependency has failed.
What happened:
Everything kept running as if nothing happened, even though the backend healthcheck returned a failing value.
My questions:
- Is the healthcheck only valid during startup to wait for a container to be "ready"? Documentation seems to suggest so.
- If so, then why keep checking for health after successful startup?
- If not, why is the
backendcontainer not being marked as unhealthy when changes cause its healthcheck to fail while running? - Is there a way to degrade a container to unhealthy while running after a successful startup?
- I'm aware that I can use
kill 1instead ofexit 1and that would causebackendcontainer to stop, but doesn't seem very clean.
In trying to reproduce the behavior you've described, the first problem I ran into is that the standard version of
wgetwill makeHEADrequests when using the--spideroption, so that your healthcheck results in:This is using
wgetversion1.21as installed in thepython:3.11image. I modified the healthcheck to look like this (and dropped the irrelevant parts of yourdocker-compose.yaml):I have your example FastAPI code in
backend/backend.py, and mybackend/Dockerfilelooks like:When I run
docker-compose up, I see:...and the container enters the "healthy" state:
If I
docker execinto the container and modify the FastAPI application to return an error, so that the code looks like this:And the container enters the "unhealthy" state:
That all seems to work as expected: the container health status changes as the response from the FastAPI service changes.
Here are some questions to help further diagnose things on your end:
What does the
Dockerfilefor your FastAPI service look like? In particular, what's the base image?Have you verified that the
wgetcommand in that image returns an error code as expected for a non-200 response from the server?