Sending "windows-1251"-encoded string in JSON from python to javascript

3.3k views Asked by At

What I need to do is best descriped as example. Previously, I had the following code:

content = u'<?xml version="1.0" encoding="windows-1251"?>\n' + ... #
with open(file_name, 'w') as f:
     f.write(content.encode('cp1251'))
     f.close;

Now I want to modify the architecture of my entire app and send the string which is supposed to be the file content to client via JSON and to generate the file via javascript.

So, now my code looks something like this:

response_data = {}
response_data['file_content'] = content.encode('cp1251')
response_data['file_name'] = file_name    
return JsonResponse({'content':json.dumps(response_data,  ensure_ascii=False)}) # error generated

The problem is that I get UnicodeDecodeError: 'ascii' codec can't decode byte 0xd4 in position 53: ordinal not in range(128)

I also tried the second option this way:

response_data = {}
response_data['file_content'] = content
response_data['file_name'] = file_name    
return JsonResponse({'content':json.dumps(response_data,  ensure_ascii=False).encode('utf8')}) # error generated

Then, on client, I try to covert utf8 to windows-1251.

 $.post ('/my_url/', data, function(response) {
        var file_content = JSON.parse(response.content).file_content;
        file_content = UnicodeToWin1251(file_content);

...but...I get distorted symbols. I know I am doing something terribly wrong here and am likely to mess up things with encoding, but still it's been an entire day I couldn't solve this issue. Could someone give a hint where my mistake is ?

1

There are 1 answers

2
Martijn Pieters On BEST ANSWER

Both XML and JSON contain data that is Unicode text. The XML declaration merely tells your XML parser how to decode the XML serialisation of that data. You wrote the serialisation by hand so to match the XML header, you had to encode to CP-1251.

The JSON standard states that all JSON should be encoded in either UTF-8, UTF-16 or UTF-32, with UTF-8 the standard; again, this is just the encoding for the serialisation.

Leave your data as Unicode, then encode that data to JSON with the json library; the library takes care of ensuring you get UTF-8 data (in Python 2), or gives you Unicode text (Python 3) that can be encoded to UTF-8 later. Your Javascript code will then decode the JSON again at which point you have Unicode text again:

response_data = {}
response_data['file_content'] = content
response_data['file_name'] = file_name    
return JsonResponse({'content':json.dumps(response_data,  ensure_ascii=False)})

There is no need whatsoever to send binary data over JSON here, you are sending text. If you Javascript code then generates the file, it is responsible for encoding to CP-1251, not your Python code.

If you must put binary data in a JSON payload, you'll need to encode that payload to some form of text. Binary data (and CP-1251-encoded text is binary data) could be encoded in text as Base-64:

import base64

response_data = {}
response_data['file_content'] = base64.encodestring(content.encode('cp1251')).decode('ascii')
response_data['file_name'] = file_name    
return JsonResponse({'content':json.dumps(response_data,  ensure_ascii=False)})

Base64 data is encoded to a bytestring containing only ASCII data, so decode it as ASCII for the JSON library, which expects text to be Unicode text.

Now you are sending binary data, wrapped in a Base64 text encoding, to the Javascript client, which now has to decode the Base64 if you need the binary payload there.