How to prevent storage of celery chord counter objects in database

129 views Asked by At

I am experiencing an issue where the django_celery_results_chordcounter table is filling up fast making me run out of server space. It was growing from a few MBs to over 99GB.

I have tried resolving this by setting CELERY_RESULT_EXPIRE=60 hoping that the celery backend cleanup task will help me clean up the table every minute but that was not happening.

I ran the task and by the time the table had grown to about 7GB, I truncated it on the psql shell. This is definitely not a solution but I had to do this so that the task can be successful without increasing server resources.

Here are the celery tasks leading to this problem. Items can be hundreds of thousands to millions.

Server specs: 16vCPUs, 64GiB memory

@celery_app.task(ignore_result=True)
def get_for_one(item_id):
    # an IO-bound task
    pass


@celery_app.task(ignore_result=True)
def get_for_many(parent_id):
    tasks = [
        group(
            get_for_one.s(item.id)
            for item in Item.objects.filter(
                owner__isnull=True, parent_id=parent_id
            ).iterator()
        )
    ]
    chord(tasks)(get_for_many_callback.si(parent_id))
celery==5.2.7
Django==4.1.1
django-celery-beat==2.4.0
django-celery-results==2.4.0
1

There are 1 answers

0
knaperek On

Celery runs the built-in cleanup periodic task daily at 4 am by default, so it won't necessarily clean the results right after they expire (but wait until the next scheduled cleanup).

If you want to run the cleanup task more often, you can schedule your own interval in CELERY_BEAT_SCHEDULE:

from datetime import timedelta

CELERY_BEAT_SCHEDULE = {
    'Custom Celery result cleanup': {
        'task': 'celery.backend_cleanup',
        'schedule': timedelta(seconds=60),
    },
    #...your other schedules
}