How to Optimize Data Importing to a Django Model using django-import-export library

2.4k views Asked by At

I have been using django-import-export library to upload my data as excel to the Django model and it has been working fine till i had to Upload an excel with 20,000 rows and it just took infinite time to get this action done.

Can you please suggest the right way to optimize data Uploading to the Django model, where i can easily upload excel files and have that data saved in my Database.

Please support.

Hi below is the code for admin.py that i tried, but its throwing error 'using_transactions' is not defined, please confirm where am i going wrong and what do i change to get bulk data imported in less time--

from django.contrib import admin
from import_export import resources
from .models import Station,Customer
from import_export.admin import ImportExportModelAdmin

# Register your models here.

class StationResource(resources.ModelResource):

    def get_or_init_instance(self,instance_loader,row):
        self.bulk_create(self, using_transactions, dry_run, raise_errors, 
        batch_size=1000)

    class Meta:
        model=Station
        use_bulk=True
        batch_size = 1000
        force_init_instance = True


class StationAdmin(ImportExportModelAdmin):
        resource_class=StationResource

admin.site.register(Station , StationAdmin)

and in the settings.py file i have set-

IMPORT_EXPORT_USE_TRANSACTIONS = True
IMPORT_EXPORT_SKIP_ADMIN_LOG = True
2

There are 2 answers

1
Nabil Bennani On

I wouldn't use import-export for large data, instead, I'd save the data file as csv from my excel file and use pandas to bridge the data into the database. Pandas does it in batches.

3
Matthew Hegarty On

import-export provides a bulk import mode which makes use of Django's bulk operations.

Simply enable the use_bulk flag on your resource:

 class Meta:
        model = Book
        fields = ('id', 'name', 'author_email', 'price')
        use_bulk = True

It should be possible to import 20k rows in a few seconds, but it will depend on your data and you may need to tweak some settings. Also, do read the caveats regarding bulk imports.

There is more detailed information in the repo.

However, even without bulk mode it should be able to import 20k rows in a few minutes. If it is taking much longer, then it is possible that the import process is making unnecessary reads on the db (i.e for each row). Enabling SQL logging will shed some light on this. CachedInstanceLoader may help with this.