how to open a URL with non utf-8 arguments

1.4k views Asked by At

Using Python I need to transfer non utf-8 encoded data (specifically shift-jis) to a URL via the query string. How should I transfer the data? Quote it? Encode in utf-8?

Thanks

3

There are 3 answers

0
bobince On BEST ANSWER

Query string parameters are byte-based. Whilst IRI-to-URI and typed non-ASCII characters will typically use UTF-8, there is nothing forcing you to send or receive your own parameters in that encoding.

So for Shift-JIS (actually typically cp932, the Windows extension of that encoding):

foo= u'\u65E5\u672C\u8A9E' # 日本語
url= 'http://www.example.jp/something?foo='+urllib.quote(foo.encode('cp932'))

In Python 3 you do it in the quote function itself:

foo= '\u65E5\u672C\u8A9E'
url= 'http://www.example.jp/something?foo='+urllib.parse.quote(foo, encoding= 'cp932')
0
Tuure Laurinolli On

I don't know what unicode has to do with this, since the query string is a string of bytes. You can use the quoting functions in urllib to quote plain strings so that they can be passed within query strings.

2
mkluwe On

By the »query string« you mean HTTP GET like in http:/{URL}?data=XYZ?

You have encoding what ever data you have via base64.b64encode using -_ as alternative character to be URL safe as an option. See here.