writing Umlaute to mariadb causes non-UTF-8-encoding

300 views Asked by At

I am using mariadb with Server charset: UTF-8 Unicode (utf8mb4) and python 3.7.3 and for some reason beyond me, a CSV file read in and written to the database is saved in some weird encoding:

models.py:

class Product(models.Model)
    data        = models.JSONField()
    store       = models.ForeignKey(Store, on_delete = models.CASCADE)
    number      = models.PositiveIntegerField()

and when writing to the database with shell:

Product.objects.create(store = store, number = 123, data = {"direction": "süden"})

this reads in the database as: {"direction": "s\u00fcden"}

but in my database the line reads as "number": 123, "data": {"direction": "s\u00fcden"}.

I already tried setting the an encoder for the JSONField:

from django.core.serializers.json import DjangoJSONEncoder
...
data = models.JSONField(encoder = DjangoJSONEncoder)

and then ran migrations again. A simple test can be done in the admin, when searching for products süden returns zero hits. But when manually altering the value to süden in the db that obviously works. Also, when searching for 123 in the admin, I see the word süden written correctly.

So my guess is, that I need to implement some kind of equivalent of pythons json ensure_ascii = False option?

Also I wrote the word süden to a Charfield which shows up correctly in the database, without any escapes/encoding errors.

2

There are 2 answers

0
xtlc On

As this took me some while and I got some help in the django forum, I want to answer the question. I was not able to implement my own decoder, and I still would love to see how that works (very little examples around) but I managed to solve the task by overwriting get_prep_value of Django's JSONField:

class MyJSONField(models.JSONField):
    def get_prep_value(self, value):
        if value is None:
            return value
        return json.dumps(value, ensure_ascii = False)

and therefore the Product model changes to:

class Product(models.Model):
    data   = MyJSONField()
    store  = models.ForeignKey(Store, on_delete = models.CASCADE)
    number = models.PositiveIntegerField()

Migrations must be run again!

0
Spatz On

DjangoJSONEncoder can be configured just by overriding the constructor as follows:

from django.db import models
from django.core.serializers.json import DjangoJSONEncoder
        
class MyJSONEncoder(DjangoJSONEncoder):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.ensure_ascii = False
    
class Dummy(models.Model):
    data = models.JSONField(encoder=MyJSONEncoder)