Python - google content api and unicode

219 views Asked by At

I've searched through so many pages to try and help myself that I'm now more confused on python2 and unicode than I was before I started.

What I'm trying to achieve:

Using the google content api v2 for python, I've written an implementation that will take products from our database and post them to Google.

This works fine until I get to some products which have unicode characters in them.

An example and the errors returned from google/python are:

D' Addario EXP11 Coated Bronze Acoustic Guitar Strings, 12-53 
Fender Stop Dreaming, Start Playing™ Affinity P Bass® With Rumble™ 15 

ERROR'utf8' codec can't decode byte 0x92 in position 1: invalid start byte
ERROR'utf8' codec can't decode byte 0x99 in position 35: invalid start byte

I know its the ' ® ™ characters but I can't work out the .encode / .decode etc. aspect of it.

So, can anyone tell me how I can take these product names with special characters in them so that I can post them to Google?

== update == I'm getting the product names from a MySQL db. The table is set to use UTF-8 as the encoding.

2

There are 2 answers

3
Yurippenet On

try:

u'Addario EXP11 Coated Bronze Acoustic Guitar Strings, 12-53 
Fender Stop Dreaming, Start Playing™ Affinity P Bass® With Rumble™ 15'

or

unicode('Addario EXP11 Coated Bronze Acoustic Guitar Strings, 12-53 
Fender Stop Dreaming, Start Playing™ Affinity P Bass® With Rumble™ 15')

But that aside. Unicode support in Python 2 is a pain in the ass a lot of times. I recommend trying Python 3 where unicode is standard.

0
Alex Hellier On

Python 3 is, the answer :) (now google support it with their sdk)