Iterating through a unicode string in Python

Question

Iterating through a unicode string in Python

7.3k views Asked by charpi At 18 September 2024 at 11:36

I've got an issue with iterating through unicode strings, character by character, with python.

print "w: ",word
for c in word:
    print "word: ",c

This is my output

w:  文本
word:  ? 
word:  ?
word:  ?
word:  ?
word:  ?
word:  ?

My desired output is:

文
本

When I use len(word) I get 6. Apparently each character is 3 unicode chunks.

So, my unicode string is successfully stored in the variable, but I cannot get the characters out. I have tried using encode('utf-8'), decode('utf-8) and codecs but still cannot obtain any good results. This seems like a simple problem but is frustratingly hard for me.

Hope someone can point me to the right direction.

Thanks!

Original Q&A

There are 4 answers

charpi On 22 June 2015 at 03:43

The code I used which works is this

fileContent = codecs.open('fileName.txt','r',encoding='utf-8')
#...split by whitespace to get words..
for c in word:
        print(c.encode('utf-8'))

Tsing On 22 June 2015 at 03:43

you should convert the word from string type to unicode:

print "w: ",word
for c in word.decode('utf-8'):
    print "word: ",c

DevB2F On 23 January 2019 at 19:29

For python 3 this is what works:

import unicodedata

word = "文本"
word = unicodedata.normalize('NFC', word)
for char in word:
    print(char)

**Pruthvi Raj** · Accepted Answer · 2015-06-22 03:15:32

# -*- coding: utf-8 -*-
word = "文本"
print(word)
for each in unicode(word,"utf-8"):
    print(each)

Output:

文本
文
本

TechQA.

Iterating through a unicode string in Python

There are 4 answers

Related Questions in PYTHON

Related Questions in UNICODE

Related Questions in PYTHON-2.X

Popular Questions

Popular Tags

Trending Questions