How to properly configure Django Channels in production using AWS, Docker, Nginx + Daphne?

4k views Asked by At

We are attempting to configure a live-chat on our website, through the use of Django Channels 2, AWS, and Nginx + Daphne. Our setup works fine running locally, however we are running in to issues when deploying to production.

Our production environment consists of two Docker containers that are deployed to AWS using Elastic Container Service (Fargate). The container running in front is a nginx configuration that is acting as a proxy server to serve static files. The second container runs our API/Django site. The proxy is running on port 8000 and forwards incoming requests to the API/Django container, which is running on port 9000. I will also note that we are using terraform to configure our AWS environment.

I have referenced multiple articles that have accomplished similar setups. For example: https://medium.com/@elspanishgeek/how-to-deploy-django-channels-2-x-on-aws-elastic-beanstalk-8621771d4ff0

However this setup uses an Elastic Beanstalk deployment, which we are not using.

Image of setup example

Proxy Dockerfile:

FROM nginxinc/nginx-unprivileged:1-alpine
LABEL maintainer='CodeDank'

COPY ./default.conf.tpl /etc/nginx/default.conf.tpl
COPY ./uwsgi_params /etc/nginx/uwsgi_params

ENV LISTEN_PORT=8000
ENV APP_HOST=app
ENV APP_PORT=9000

USER root

RUN mkdir -p /vol/static
RUN chmod 755 /vol/static
RUN touch /etc/nginx/conf.d/default.conf
RUN chown nginx:nginx /etc/nginx/conf.d/default.conf

COPY ./entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh

USER nginx

CMD ["/entrypoint.sh"]

API/site Dockerfile:

FROM python:3.7-alpine3.11
LABEL maintainer="CodeDank"

ENV PYTHONUNBUFFERED 1
ENV PATH="/scripts:${PATH}"

RUN pip install --upgrade pip

COPY ./requirements.txt /requirements.txt
RUN apk add --update --no-cache postgresql-client jpeg-dev
RUN apk add --update --no-cache --virtual .tmp-build-deps \
        gcc libc-dev linux-headers postgresql-dev \
        musl-dev zlib zlib-dev
RUN apk add --update --no-cache libressl-dev musl-dev libffi-dev
RUN apk add --update --no-cache g++ freetype-dev jpeg-dev
RUN pip install -r /requirements.txt
RUN apk del .tmp-build-deps

RUN mkdir /app
WORKDIR /app
COPY ./app /app
COPY ./scripts /scripts
RUN chmod +x /scripts/*

RUN mkdir -p /vol/web/media
RUN mkdir -p /vol/web/static
RUN adduser -D user
RUN chown -R user:user /vol/
RUN chmod -R 755 /vol/web
USER user

CMD ["entrypoint.sh"]

(entrypoint scripts shown below)

We have created an AWS Elasticache Redis server to be used as the CHANNEL_LAYERS backend for Django channels. The 'REDIS_HOSTNAME' environment variable is the endpoint address of the redis server.

# Channels Settings
ASGI_APPLICATION = "app.routing.application"
CHANNEL_LAYERS = {
    "default": {
        "BACKEND": "channels_redis.core.RedisChannelLayer",
        "CONFIG": {
            "hosts": [
                (os.environ.get('REDIS_HOSTNAME'), 6379)
            ],
        },
    },
}

asgi.py file:

import os
import django
from channels.routing import get_default_application


os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'app.settings')
django.setup()
application = get_default_application()

Following the Channels docs, we are attempting to configure daphne to run the asgi application within our project. Ideally, we would like this setup to have our nginx proxy server forward all websocket requests to the daphne server, running on port 9001. All of our websocket endpoints will contain /ws/, thus the nginx proxy configuration has been defined as shown below.

default.conf.tpl:

upstream channels-backend {
 server localhost:9001;
}

server {
    listen ${LISTEN_PORT};

    location /static {
        alias /vol/static;
    }

    location / {
        uwsgi_pass              ${APP_HOST}:${APP_PORT};
        include                 /etc/nginx/uwsgi_params;
        client_max_body_size    4G;
    }

    location /ws/ {

         proxy_pass http://channels-backend;
         proxy_http_version 1.1;
         proxy_set_header Upgrade $http_upgrade;
         proxy_set_header Connection "upgrade";
         proxy_redirect off;
         proxy_set_header Host $host;
         proxy_set_header X-Real-IP $remote_addr;
         proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
         proxy_set_header X-Forwarded-Host $server_name;
    }
}

Proxy entrypoint script:

#!/bin/sh

set -e

envsubst '${LISTEN_PORT},${APP_HOST},${APP_PORT}' < /etc/nginx/default.conf.tpl > /etc/nginx/conf.d/default.conf
nginx -g 'daemon off;'

API/site entrypoint script:

#!/bin/sh

set -e

python manage.py collectstatic --noinput
python manage.py wait_for_db
python manage.py migrate

uwsgi --socket :9000 --workers 4 --master --enable-threads --module app.wsgi

daphne -b 0.0.0.0 -p 9001 app.asgi:application

Upon trying to connect to the websocket on our site, a 502 error is returned.

Error during WebSocket handshake: Unexpected response code: 502.

I suspect that the daphne server is either not running as we expect, or it is not properly configured with the nginx server. Within the API entrypoint script, would the daphne command even be ran as it currently stands? Or, is there anything that we are missing that is required to have the daphne run behind the nginx proxy? My initial thought is that the daphne command can not be run after the uwsgi command within the entrypoint script. However, I am not exactly sure where else this command would need to be placed in order to run the daphne process.

The cloudwatch logs for the proxy are not super detailed, however I receive this error message when attempting to connect to a websocket on the site.

[error] 8#8: *53700 connect() failed (111: Connection refused) while connecting to upstream, client: 10.1.1.190, server: , request: "GET /ws/chat/djagno/ HTTP/1.1", upstream: "http://127.0.0.1:9001/ws/chat/djagno/", host: "mycustomdomain.net"

I have seen that there are other approaches to this problem that do not include using the Nginx proxy to direct the websocket traffic to daphne. Maybe our approach is not the best solution? We are open to alternative configurations.

Any feedback would be greatly appreciated. Thanks!

3

There are 3 answers

0
AndroidonEarth On

Since you mentioned you are using Terraform for your AWS deployments, I would check the configuration for your AWS security groups, specifically for where you are setting up the security groups between your EC2 instance and Elasticache Redis.

https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/elasticache_cluster

edit: On second glance I just noticed how you are starting up uwsgi and daphne. The way you are doing it now you are starting uwsgi in the foreground and then this process just waits and daphne never gets started up (hence the 502 error).

Change

uwsgi --socket :9000 --workers 4 --master --enable-threads --module app.wsgi

daphne -b 0.0.0.0 -p 9001 app.asgi:application

to

uwsgi --socket :9000 --workers 4 --master --enable-threads --module app.wsgi & daphne -b 0.0.0.0 -p 9001 app.asgi:application

This will start uwsgi in the background and then move on to start Daphne.

If you need a way to then kill both you can run this in a script and then add a wait at the end, so that when you kill the script the uwsgi and daphne processes get killed as well. Otherwise, you can look into daemonizing the uwsgi and daphne startups with systemd or supervisor.

0
Paul Tuckett On

There could be a few issues here. The first thing I discovered when dealing with websocket requests is that they behave differently on your server, than they do with localhost. I had to modify my Django Channels logic in several different areas depending on the versions of Django, Django Channels, Daphne, etc.

For example: When we upgraded to Channels 3.0, we couldn't access our database without the database_sync_to_async() decorator and had to offload the calls to their own separate functions.

Check your routing.py for request stoppers like AllowHostsOriginValidator.

If you are using custom middleware, the scope object is different based on your environment and the way you access the data.

Also, try running your Daphne outside of your daemon process through a unix socket like so:

daphne -u /etc/supervisor/socks/daphne.sock --fd 0 --access-log - --proxy-headers project.asgi:application -v 3

We use the following set up, if you want to give it a go.

Load balancing nginx config:

upstream mywebapp {
    server front_end_ip:port;
    }

#This upgrades the connection for websockets from https to websocket
map $http_upgrade $connection_upgrade {
    default   upgrade;
    ''        close;
    }
location /ws/ {
    add_header X-debug-message "The /ws/ location was served from the ascend load balancer" always;
    proxy_pass http://mywebapp/ws/;
    proxy_redirect off;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "connection_upgrade";
    proxy_read_timeout 86400;
    }

Front end nginx config:

upstream mybackend {
    server  django_server_ip:port;
}

map $http_upgrade $connection_upgrade {
    default upgrade;
    ''      close;
}

location /ws/ {
    add_header X-debug-message "The /ws/ location was served from angular 1" always;
    proxy_pass http://mybackend/ws/;
    proxy_redirect off;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "connection_upgrade";
    proxy_read_timeout 86400;

Django server nginx config:

upstream daphne {
    server 0.0.0.0:9001;
}

location /ws/ {
    add_header X-debug-message "The /ws/ location was served from daphne" always;
    proxy_pass http://daphne;
    proxy_redirect off;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";
    proxy_cache_bypass $http_upgrade;
}
0
f7o On

One thing which comes to my mind is, are you scaling the nginx container? You might need to enable session stickiness on your Application Load Balancer in order make websockets work.

Reference: https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-target-groups.html#sticky-sessions