I'm generating file names from a list pulled out from a postgres DB with Python 2.7.9. In this list there are words with special char. Normally I use ''.join()
to record the name and fire it to my loader but I have just one name that want be recognized. the .py is set for utf-8 coding, but the words are in Portuguese, I think latin-1 coding.
from pydub import AudioSegment
from pydub.playback import play
templist = ['+ Orégano','- Búfala','+ Rúcola']
count_ins = (len(templist)-1)
while (count_ins >= 0 ):
kot_istructions = AudioSegment.from_ogg('/home/effe/voice_orders/Voz/'+"".join(templist[count_ins])+'.ogg')
count_ins-=1
play(kot_istructions)
The first two files are loaded:
/home/effe/voice_orders/Voz/+ Orégano.ogg
/home/effe/voice_orders/Voz/- Búfala.ogg
The third should be:
/home/effe/voice_orders/Voz/+ Rúcola.ogg
But python is trying to load
/home/effe/voice_orders/Voz/+ R\xc3\xbacola.ogg
Why just this one? I've tried to use normalize()
to remove the accent but since this is a string the method didn't work.
Print works well, as db update. Just file name creation doesn't works as expected.
Suggestions?
It seems the root cause might be that the encoding of these names in inconsisitent within your database.
If you run:
You get
which is in fact a Python
unicode
, correctly representing the name. So, what should you do? Although it's a really unclean programming style, you could play.encode()/.decode()
whackamole, where youtry
to decode the raw string from your db usingutf-8
, and failing that,latin-1
. It would look something like this:As a general rule, always work with clean unicode objects within your own source, and only convert to an encoding on saving it out. Also, don't let people insert data into a database without specifying the encoding, as that will stop you from having this problem in the first place.
Hope that helps!