Is it ok to pass an OrderedDict as a Celery task argument?

4.9k views Asked by At

I have inside a Django REST Framework's serializer an overridden update method.
In this update, as user can send lots of children, I have an asynchronous Celery task process_children, to deal with the kids.

class MyModelSerializer(serializers.ModelSerializer):
    ....

    @transaction.atomic
    def update(self, mymodel, validated_data):
        try:
            children_data = validated_data.pop('children')
            transaction.on_commit(lambda: process_children.apply_async(
                countdown=1,
                args=[mymodel.id, children_data]))
        except KeyError:
            pass
        ...

In the args, there is one argument which is not a json object but an OrderedDict: children_data.

The task looks like:

@app.task
def process_children(mymodel_id, children_data):
    mymodel = MyModel.objects.get(pk=mymodel_id)
    children = mymodel.children.all()
    for child_data in children_data:
        try:
            child = children.get(start=child_data['start'])
            child = populate_child(child, child_data)
            child.save()
        except Child.DoesNotExist:
            create_child(mymodel, child_data)

I read that we should only send json (or pickle, yaml, whatever...) args.

  • But this setup seems to work
  • I can even send datetime object (i.e. the start attribute I use in the task to match a stored child with new values sent through the api).

So what's happening here?

  • Is everything ok, celery serializes and deserializes OrderedDict like a boss.
  • Or I am crazy and should serialize before invoking the task and deserialize inside the task?

[UPDATE, adding CELERY settings]

CELERY_BROKER_URL = get_env_variable('REDIS_URL')
CELERY_BROKER_POOL_LIMIT = 0
CELERY_REDIS_MAX_CONNECTIONS = 10
CELERY_ACCEPT_CONTENT = ['application/json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_TIMEZONE = 'Europe/London'
2

There are 2 answers

4
schillingt On

You're using the pickle serializer which will handle objects relatively well, but there are concerns with it. Here's a blog post on the concept of serializing and celery.

1
bak2trak On

Yes, you are doing it correct.

As mentioned in doc.

Data transferred between clients and workers needs to be serialized, so every message in Celery has a content_type header that describes the serialization method used to encode it.

Also, from celery 4.0 default serializer is JSON (which was pickle earlier). So whenever you are calling this task celery by default is serializing and de-serializing it. If you want to use any other serializer, then while calling task you need to specify content-type (if you are using .delay then by default serializer will be json.

process_children.apply_async((model_id, children_data), serializer='pickle')