Deploying Multi-Service RAG Application on Google Cloud Run with Docker: Connection Issues with Ollama

35 views Asked by At

I'm deploying a RAG application on Google Cloud Run using a Docker container, integrating Flask, Streamlit, and Chainlit services. Despite configuring my Dockerfile to set up the environment and start services via a bash script, I encounter connection issues with Ollama, preventing my RAG app from functioning properly.

Error raised by inference endpoint: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/embeddings (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x3e19f5a96ef0>: Failed to establish a new connection: [Errno 111] Connection refused'))

Questions:

How can I ensure Ollama starts correctly and is accessible to my RAG application in Google Cloud Run? Is a single Docker image approach viable for this multi-service application, or should I consider splitting services into separate images?

Thanks for your time, attention and help (in advance). Truly appreciated.

DockerFile

FROM python:3.10-slim
WORKDIR /app
COPY . .
RUN pip install --upgrade pip
RUN pip install -r requirements.txt
COPY start_services.sh .
RUN chmod +x start_services.sh
ENV OLLAMA_HOST="0.0.0.0:11434"
EXPOSE 5000 11434
CMD ["./start_services.sh"]

start_services.sh: #!/bin/bash ollama & ollama_pid=$! chainlit run rag.py trap "kill $ollama_pid" EXIT

Attempts to install and start Ollama both via pip and directly in the Dockerfile have been unsuccessful. I'm considering separating services into different Docker images but prefer a single-image solution if feasible. Also docker image fails locally to run properly. Or drop ollama...to use closed source llms, which would be sad..

0

There are 0 answers