We have a Django application that uses Django-river for workflow management. For performance improvement, we had to use bulk_create. We need to insert data into a couple of tables and several rows in each. Initially, we were using the normal .save() method and the workflow was working as expected (as the post save() signals were creating properly). But once we moved to the bulk_create, the performance was improved from minutes to seconds. But the Django_river stopped working and there was no default post save signals. We had to implement the signals based on the documentation available.
class CustomManager(models.Manager):
def bulk_create(items,....):
super().bulk_create(...)
for i in items:
[......] # code to send signal
And
class Task(models.Model):
objects = CustomManager()
....
This got the workflow working again, but the generation of signals is taking time and this destroys all the performance improvement gained with bulk_create. So is there a way to improve the signal creation?
More details
def post_save_fn(obj):
post_save.send(obj.__class__, instance=obj, created=True)
class CustomManager(models.Manager):
def bulk_create(self, objs, **kwargs):
#Your code here
data_obj = super(CustomManager, self).bulk_create(objs,**kwargs)
for i in data_obj:
# t1 = threading.Thread(target=post_save_fn, args=(i,))
# t1.start()
post_save.send(i.__class__, instance=i, created=True)
return data_obj
class Test(Base):
test_name = models.CharField(max_length=100)
test_code = models.CharField(max_length=50)
objects = CustomManager()
class Meta:
db_table = "test_db"
What is the problem?
As others have mentioned in the comments, the problem is that the functions that are getting called via the
post_save
are taking a long time. (Remember that signals are not async!! - this is a common misconception).I'm not familiar with
django-river
but taking a quick look at the functions that will get called post-save (see here and here) we can see that they involve additional calls to the database.Whilst you save a lot of individual db hits by using
bulk_create
you are still doing calling the database again multiple times for each post_save signal.What can be done about it?
In short. Not much!! For the vast majority of django requests, the slow part will be calling the database. This is why we try and minimise the number of calls to the db (using things like
bulk_create
).Reading through the first few paragraphs of
django-river
the whole idea is to move things that would normally be in code to the database. The big advantage here is that you don't need to re-write code and re-deploy so often. But the disadvantage is that you're inevitably going to have to refer to the database more, which is going to slow things down. This will be fine for some use-cases, but not all.There are two things I can think of which might help:
post_save
signals are doing in your own function, and do it more efficiently. But this will definitely depend upon your data, and your app, and will move away from the philosophy ofdjango-river
.