celeryd and celerybeat pid files are not being created, workers are not starting, but output says OK

8k views Asked by At

I setted up celeryd and celerybeat as daemons, and they worked until not too long. But since some time, it wont start workers and not create pid files.

Here is my /etc/default/celeryd:

# Name of nodes to start
CELERYD_NODES="w1 w2 w3 w4 w5 w6 w7 w8"

# Extra arguments to celeryd
CELERYD_OPTS="--time-limit=300 --concurrency=8"

# Where to chdir at start.
CELERYD_CHDIR="/srv/www/web-system/myproject"

# %n will be replaced with the nodename.
#CELERYD_LOG_FILE="/var/log/celery/%n.log"
#CELERYD_PID_FILE="/var/run/celery/%n.pid"
CELERYD_LOG_FILE="/srv/www/web-system/logs/celery/%n.log"
CELERYD_PID_FILE="/srv/www/web-system/pids/celery/%n.pid"

# Log level to use for celeryd. Default is INFO.
CELERYD_LOG_LEVEL="INFO"

# How to call "manage.py celeryd_multi"
CELERYD_MULTI="$CELERYD_CHDIR/manage.py celeryd_multi"

# How to call "manage.py celeryctl"
CELERYCTL="$CELERYD_CHDIR/manage.py celeryctl"

# Workers should run as an unprivileged user.
#CELERYD_USER="celery"
#CELERYD_GROUP="celery"
CELERYD_USER="myuser"
CELERYD_GROUP="myuser"

# Name of the projects settings module.
export DJANGO_SETTINGS_MODULE="myproject.settings"

and here's the init.d script:

#!/bin/sh -e
# ============================================
#  celeryd - Starts the Celery worker daemon.
# ============================================
#
# :Usage: /etc/init.d/celeryd {start|stop|force-reload|restart|try-restart|status}
# :Configuration file: /etc/default/celeryd
#
# See http://docs.celeryproject.org/en/latest/tutorials/daemonizing.html#generic-init-scripts


### BEGIN INIT INFO
# Provides:          celeryd
# Required-Start:    $network $local_fs $remote_fs
# Required-Stop:     $network $local_fs $remote_fs
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Short-Description: celery task worker daemon
### END INIT INFO

# some commands work asyncronously, so we'll wait this many seconds
SLEEP_SECONDS=5

DEFAULT_PID_FILE="/var/run/celery/%n.pid"
DEFAULT_LOG_FILE="/var/log/celery/%n.log"
DEFAULT_LOG_LEVEL="INFO"
DEFAULT_NODES="celery"
DEFAULT_CELERYD="-m celery.bin.celeryd_detach"

CELERY_DEFAULTS=${CELERY_DEFAULTS:-"/etc/default/celeryd"}

test -f "$CELERY_DEFAULTS" && . "$CELERY_DEFAULTS"

# Set CELERY_CREATE_DIRS to always create log/pid dirs.
CELERY_CREATE_DIRS=${CELERY_CREATE_DIRS:-0}
CELERY_CREATE_RUNDIR=$CELERY_CREATE_DIRS
CELERY_CREATE_LOGDIR=$CELERY_CREATE_DIRS
if [ -z "$CELERYD_PID_FILE" ]; then
    CELERYD_PID_FILE="$DEFAULT_PID_FILE"
    CELERY_CREATE_RUNDIR=1
fi
if [ -z "$CELERYD_LOG_FILE" ]; then
    CELERYD_LOG_FILE="$DEFAULT_LOG_FILE"
    CELERY_CREATE_LOGDIR=1
fi

CELERYD_LOG_LEVEL=${CELERYD_LOG_LEVEL:-${CELERYD_LOGLEVEL:-$DEFAULT_LOG_LEVEL}}
CELERYD_MULTI=${CELERYD_MULTI:-"celeryd-multi"}
CELERYD=${CELERYD:-$DEFAULT_CELERYD}
CELERYD_NODES=${CELERYD_NODES:-$DEFAULT_NODES}

export CELERY_LOADER

if [ -n "$2" ]; then
    CELERYD_OPTS="$CELERYD_OPTS $2"
fi

CELERYD_LOG_DIR=`dirname $CELERYD_LOG_FILE`
CELERYD_PID_DIR=`dirname $CELERYD_PID_FILE`

# Extra start-stop-daemon options, like user/group.
if [ -n "$CELERYD_USER" ]; then
    DAEMON_OPTS="$DAEMON_OPTS --uid=$CELERYD_USER"
fi
if [ -n "$CELERYD_GROUP" ]; then
    DAEMON_OPTS="$DAEMON_OPTS --gid=$CELERYD_GROUP"
fi

if [ -n "$CELERYD_CHDIR" ]; then
    DAEMON_OPTS="$DAEMON_OPTS --workdir=$CELERYD_CHDIR"
fi


check_dev_null() {
    if [ ! -c /dev/null ]; then
        echo "/dev/null is not a character device!"
        exit 75  # EX_TEMPFAIL
    fi
}


maybe_die() {
    if [ $? -ne 0 ]; then
        echo "Exiting: $* (errno $?)"
        exit 77  # EX_NOPERM
    fi
}

create_default_dir() {
    if [ ! -d "$1" ]; then
        echo "- Creating default directory: '$1'"
        mkdir -p "$1"
        maybe_die "Couldn't create directory $1"
        echo "- Changing permissions of '$1' to 02755"
        chmod 02755 "$1"
        maybe_die "Couldn't change permissions for $1"
        if [ -n "$CELERYD_USER" ]; then
            echo "- Changing owner of '$1' to '$CELERYD_USER'"
            chown "$CELERYD_USER" "$1"
            maybe_die "Couldn't change owner of $1"
        fi
        if [ -n "$CELERYD_GROUP" ]; then
            echo "- Changing group of '$1' to '$CELERYD_GROUP'"
            chgrp "$CELERYD_GROUP" "$1"
            maybe_die "Couldn't change group of $1"
        fi
    fi
}


check_paths() {
    if [ $CELERY_CREATE_LOGDIR -eq 1 ]; then
        create_default_dir "$CELERYD_LOG_DIR"
    fi
    if [ $CELERY_CREATE_RUNDIR -eq 1 ]; then
        create_default_dir "$CELERYD_PID_DIR"
    fi
}

create_paths() {
    create_default_dir "$CELERYD_LOG_DIR"
    create_default_dir "$CELERYD_PID_DIR"
}

export PATH="${PATH:+$PATH:}/usr/sbin:/sbin"


_get_pid_files() {
    [ ! -d "$CELERYD_PID_DIR" ] && return
    echo `ls -1 "$CELERYD_PID_DIR"/*.pid 2> /dev/null`
}

stop_workers () {
    $CELERYD_MULTI stopwait $CELERYD_NODES --pidfile="$CELERYD_PID_FILE"
    sleep $SLEEP_SECONDS
}


start_workers () {
    $CELERYD_MULTI start $CELERYD_NODES $DAEMON_OPTS        \
                         --pidfile="$CELERYD_PID_FILE"      \
                         --logfile="$CELERYD_LOG_FILE"      \
                         --loglevel="$CELERYD_LOG_LEVEL"    \
                         --cmd="$CELERYD"                   \
                         $CELERYD_OPTS
    sleep $SLEEP_SECONDS
}


restart_workers () {
    $CELERYD_MULTI restart $CELERYD_NODES $DAEMON_OPTS      \
                           --pidfile="$CELERYD_PID_FILE"    \
                           --logfile="$CELERYD_LOG_FILE"    \
                           --loglevel="$CELERYD_LOG_LEVEL"  \
                           --cmd="$CELERYD"                 \
                           $CELERYD_OPTS
    sleep $SLEEP_SECONDS
}

check_status () {
    local pid_files=
    pid_files=`_get_pid_files`
    [ -z "$pid_files" ] && echo "celeryd not running (no pidfile)" && exit 1

    local one_failed=
    for pid_file in $pid_files; do
        local node=`basename "$pid_file" .pid`
        local pid=`cat "$pid_file"`
        local cleaned_pid=`echo "$pid" | sed -e 's/[^0-9]//g'`
        if [ -z "$pid" ] || [ "$cleaned_pid" != "$pid" ]; then
            echo "bad pid file ($pid_file)"
        else
            local failed=
            kill -0 $pid 2> /dev/null || failed=true
            if [ "$failed" ]; then
                echo "celeryd (node $node) (pid $pid) is stopped, but pid file exists!"
                one_failed=true
            else
                echo "celeryd (node $node) (pid $pid) is running..."
            fi
        fi
    done

    [ "$one_failed" ] && exit 1 || exit 0
}


case "$1" in
    start)
        check_dev_null
        check_paths
        start_workers
    ;;

    stop)
        check_dev_null
        check_paths
        stop_workers
    ;;

    reload|force-reload)
        echo "Use restart"
    ;;

    status)
        check_status
    ;;

    restart)
        check_dev_null
        check_paths
        restart_workers
    ;;
    try-restart)
        check_dev_null
        check_paths
        restart_workers
    ;;
    create-paths)
        check_dev_null
        create_paths
    ;;
    check-paths)
        check_dev_null
        check_paths
    ;;
    *)
        echo "Usage: /etc/init.d/celeryd {start|stop|restart|kill|create-paths}"
        exit 64  # EX_USAGE
    ;;
esac

exit 0

Also, I executed the init script with the following command: sh -x /etc/init.d/celeryd start, as suggested in the documentation, and this is the output:

# sh -x /etc/init.d/celeryd start
+ SLEEP_SECONDS=5
+ DEFAULT_PID_FILE=/var/run/celery/%n.pid
+ DEFAULT_LOG_FILE=/var/log/celery/%n.log
+ DEFAULT_LOG_LEVEL=INFO
+ DEFAULT_NODES=celery
+ DEFAULT_CELERYD=-m celery.bin.celeryd_detach
+ CELERY_DEFAULTS=/etc/default/celeryd
+ test -f /etc/default/celeryd
+ . /etc/default/celeryd
+ CELERYD_NODES=w1 w2 w3 w4 w5 w6 w7 w8
+ CELERYD_OPTS=--time-limit=300 --concurrency=8
+ CELERYD_CHDIR=/srv/www/web-system/myproject
+ CELERYD_LOG_FILE=/srv/www/web-system/logs/celery/%n.log
+ CELERYD_PID_FILE=/srv/www/web-system/pids/celery/%n.pid
+ CELERYD_LOG_LEVEL=INFO
+ CELERYD_MULTI=/srv/www/web-system/myproject/manage.py celeryd_multi
+ CELERYCTL=/srv/www/web-system/myproject/manage.py celeryctl
+ CELERYD_USER=myproject
+ CELERYD_GROUP=myproject
+ export DJANGO_SETTINGS_MODULE=myproject.settings
+ CELERY_CREATE_DIRS=0
+ CELERY_CREATE_RUNDIR=0
+ CELERY_CREATE_LOGDIR=0
+ [ -z /srv/www/sistema-web/pids/celery/%n.pid ]
+ [ -z /srv/www/sistema-web/logs/celery/%n.log ]
+ CELERYD_LOG_LEVEL=INFO
+ CELERYD_MULTI=/srv/www/web-system/myproject/manage.py celeryd_multi
+ CELERYD=-m celery.bin.celeryd_detach
+ CELERYD_NODES=w1 w2 w3 w4 w5 w6 w7 w8
+ export CELERY_LOADER
+ [ -n  ]
+ dirname /srv/www/web-system/logs/celery/%n.log
+ CELERYD_LOG_DIR=/srv/www/web-system/logs/celery
+ dirname /srv/www/web-system/pids/celery/%n.pid
+ CELERYD_PID_DIR=/srv/www/web-system/pids/celery
+ [ -n yougrups ]
+ DAEMON_OPTS= --uid=myprojects
+ [ -n yougrups ]
+ DAEMON_OPTS= --uid=myprojects --gid=myprojects
+ [ -n /srv/www/web-system/myprojects ]
+ DAEMON_OPTS= --uid=myproject --gid=myproject --workdir=/srv/www/web-system/myproject
+ export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/sbin:/sbin
+ check_dev_null
+ [ ! -c /dev/null ]
+ check_paths
+ [ 0 -eq 1 ]
+ [ 0 -eq 1 ]
+ start_workers
+ /srv/www/web-system/myproject/manage.py celeryd_multi start w1 w2 w3 w4 w5 w6 w7 w8 --uid=myproject --gid=myproject --workdir=/srv/www/web-system/myproject --pidfile=/srv/www/web-system/pids/celery/%n.pid --logfile=/srv/www/web-system/logs/celery/%n.log --loglevel=INFO --cmd=-m celery.bin.celeryd_detach --time-limit=300 --concurrency=8
celeryd-multi v3.0.21 (Chiastic Slide)
> Starting nodes...
    > w1.myproject: OK
    > w2.myproject: OK
    > w3.myproject: OK
    > w4.myproject: OK
    > w5.myproject: OK
    > w6.myproject: OK
    > w7.myproject: OK
    > w8.myproject: OK
+ sleep 5
+ exit 0

Then, when I check the pids dir, it is empty, and ps aux says there are no active process about it. There is nothing in the logs either. I'm not using virtualenv. It just stopped working. The version of django-celery is 3.0.21. Here's my wsgi script:

#!/usr/bin/python
# -*- coding: utf-8 -*-
import os
import sys

path = '/srv/www/web-system/'
if path not in sys.path:
    sys.path.append(path)
    sys.path.append(path + 'myproject/')

os.environ['DJANGO_SETTINGS_MODULE'] = 'myproject.settings'

import djcelery
djcelery.setup_loader()

import django.core.handlers.wsgi
application = django.core.handlers.wsgi.WSGIHandler()

And this is my djcelery associated settings:

# For Celery & RabbitMQ:
# Create user and vhost for RabbitMQ with the following commands:
#
# $ sudo rabbitmqctl add_user myproject <mypassword>
# $ sudo rabbitmqctl add_vhost myproject
# $ sudo rabbitmqctl set_permissions -p myproject myproject ".*" ".*" ".*"
# Format: amqp://user:password@host:port/vhost
BROKER_URL = 'amqp://myproject:mypassword@localhost:5672/myproject'
import djcelery
djcelery.setup_loader()

Please, any suggestion would be really appreciated!!!! thanks in advance...

2

There are 2 answers

3
CamHart On BEST ANSWER

There's probably an error in your code. Try running it manually using

celery worker -A appname

If it throws an error, then you know that's whats wrong with it.

1
anabeto93 On

It most likely has to do with memory on your system
Info Logs [2017-08-02 10:00:32,004: CRITICAL/MainProcess] Unrecoverable error: OSError(12, 'Cannot allocate memory') Traceback (most recent call last)

I was just debugging mine thanks to @Adriaan