Efficiently delete orphaned m2m objects/tags in Django

1.3k views Asked by At

I have two models - Photo and Tag - which are connected via a ManyToManyField.

class Photo(models.Model):
    tags = models.ManyToManyField(Tag)

class Tag(models.Model):
    lang = models.CharField(max_length=2)
    name_es = models.CharField(max_length=40)
    name_en = models.CharField(max_length=40)

Every once in a while, we get orphaned tags, that are not referenced any more by any photo. Is there an efficient way of deleting those tags? I know about this answer: Django: delete M2M orphan entries?

And our solution looks like this at the moment:

for tag in Tag.objects.all():
    if not tag.photo_set.select_related(): tag.delete()

However, with increasing database, the runtime of this script is becoming distressingly high :-P Is there an efficient way of getting a list of all tag IDs from the tags table and then a list of all tag IDs from the many-to-many table to create an intersection list?

3

There are 3 answers

0
okm On BEST ANSWER

Try sub-query w/ intermediate table

qs = Tag.objects.exclude(pk__in=Book.tags.through.objects.values('tag'))

# then you could
qs.delete()

# or if you need to trigger signal per item
for x in qs:
    x.delete()
3
Guillaume Lebreton On

I found a way to do this "in real time":

from django.db.models.signals import m2m_changed
from django.dispatch import receiver

class Photo(models.Model):
    tags = models.ManyToManyField(Tag)

class Tag(models.Model):
    lang = models.CharField(max_length=2)

@receiver(m2m_changed, sender=Photo.tags.through)
def delete_orphean_dateranges(sender, **kwargs):
    # when something is removed from the m2m:
    if kwargs['action'] == 'post_remove':  
        Tag.objects.filter(pk__in=kwargs['pk_set'], photo_set=None).delete()
        # select removed tags and check if they are not linked
        # to any Photo, and delete it

This way, each time you edit the Photo's m2m when deleting a tag from the m2m, this function is called.

0
Simon Steinberger On

We've had to further improve performance of this task, so I modified okm's solution a bit:

    all_tag_pks = Tag.objects.values_list('pk', flat=True)
    used_tag_pks = Photo.tags.through.objects.values_list('tag', flat=True)
    Tag.objects.filter(pk__in=list(set(all_tag_pks) - set(used_tag_pks))).delete()

By that, the query to the database gets a lot smaller and faster.