Issue using Django api for bulk creating and updating records by csv

805 views Asked by At

New to django and working on bulk creating and updating rows in my database using csv. I am using this:

https://pypi.org/project/django-bulk-update-or-create/#description

I can get the first option: bulk_update_or_create working fine but when I use the bulk option: bulk_update_or_create_context it uploads data into my database but as numbers only (example below) - I am missing something really obvious here but I cannot figure it out.

I would be expecting data like this for example (and its what I get using the first option bulk_update_or_create:

vsim_iccid - 8997103118112597732F (pk)
country_or_region - AE_UNITED ARAB EMIRATES
operator - DU
vsim_imsi - 424030246932624
online_country - AE
sim_status - Enable,
plmn_set - 42403
package1 - UAE 50
package2 - blank

instead I get this when I use the bulk option bulk_update_or_create_context:

vsim_iccid - 1
country_or_region - 21
vsim_imsi - 21
online_country - 21
sim_status - 21
plmn_set - 21
package1 - 21
package2 - 21

code:

def upload_vsim_mgmt(request):
    form = CsvModelForm(request.POST or None, request.FILES or None)
    if form.is_valid():
        form.save()
        form = CsvModelForm()
        obj = Csv.objects.get(activated=False)
        with open(obj.file_name.path, 'r') as f:
            df = pd.read_csv(f, encoding='latin1', error_bad_lines=False,   
index_col=False, dtype='unicode', sep=',').replace(np.nan, '', 
regex=True).replace("\t", '', regex=True)
            #print(df)
            row_iter = df.iterrows()
            items = [
                VSIMData(
                    country_or_region=row['Country or Region    '],
                    operator=row['Operator  '],
                    vsim_imsi=row['IMSI '],
                    vsim_iccid=row['ICCID   '],
                    online_country=row['Online Country or Region    '],
                    sim_status=row['SIM Status  '],
                    plmn_set=row['PLMN Set  '],
                    package1=row['Package 1 '],
                    package2=row['Package 2 '],

            )
            for index, row in row_iter
        ]
        with VSIMData.objects.bulk_update_or_create_context(['operator','country_or_region', 'vsim_imsi', 'online_country', 'sim_status', 'plmn_set','package1', 'package2'], match_field='vsim_iccid', batch_size=100) as bulkit:
            for i in range(10000):
                bulkit.queue(VSIMData(vsim_iccid=i, operator=i+20, country_or_region=i+20, vsim_imsi=i+20, online_country=i+20, sim_status=i+20, plmn_set=i+20, package1=i+20, package2=i+20))
        obj.activated = True
        obj.save()

model:

 class VSIMData(models.Model):
    objects = BulkUpdateOrCreateQuerySet.as_manager()
    country_or_region = models.CharField(max_length=250, blank=True, null=True)
    operator = models.CharField(max_length=50, blank=True, null=True)
    vsim_imsi = models.CharField(max_length=20, blank=True, null=True)
    vsim_iccid = models.CharField(max_length=20, unique=True, primary_key=True)
    online_country = models.CharField(max_length=2, blank=True, null=True)
    sim_status = models.CharField(max_length=50, blank=True, null=True)
    plmn_set = models.CharField(max_length=250, blank=True, null=True)
    package1 = models.CharField(max_length=50, blank=True, null=True)
    package2 = models.CharField(max_length=50, blank=True, null=True)


    def __str__(self):
        return self.vsim_iccid

Any help where I am going wrong would be awesome thanks!

1

There are 1 answers

1
Cameron Cairns On

Making this an answer since it's a little too large for a comment. I read the linked package page and frankly I'm puzzled. Both the example on the package page and your example fails to reference the items variable.

from the package page:

with RandomData.objects.bulk_update_or_create_context(['data'], match_field='uuid', batch_size=10) as bulkit:
    for i in range(10000):
        bulkit.queue(RandomData(uuid=i, data=i + 20))

From the code you posted

with VSIMData.objects.bulk_update_or_create_context(['operator','country_or_region', 'vsim_imsi', 'online_country', 'sim_status', 'plmn_set','package1', 'package2'], match_field='vsim_iccid', batch_size=100) as bulkit:
            for i in range(10000):
                bulkit.queue(VSIMData(vsim_iccid=i, operator=i+20, country_or_region=i+20, vsim_imsi=i+20, online_country=i+20, sim_status=i+20, plmn_set=i+20, package1=i+20, package2=i+20))

This is different from how the package page uses bulk_update_or_create

items = [
    RandomData(uuid=1, data='data for 1'),
    RandomData(uuid=2, data='data for 2'),
]
RandomData.objects.bulk_update_or_create(items, ['data'], match_field='uuid')

Here items are referenced explicitly.. Honestly I'm thinking that perhaps the code listed on the package page may not actually work. I'd dig into the source code of the library. Maybe drop a pdb within the bulk_update_or_create method to try and figure out what's going on. I'd imagine you'd need to reference the items variable you construct earlier in your example somehow.