Python: calling upper() on words containing non-latin characters

263 views Asked by At

I have a file with words in lines, ex

А
б
Вв
Гг

(non-latin letters) etc.

I want to get this:

А
Б
ВВ
ГГ

while after the code runs I see no changes

here is code:

f = open('sample.csv')
for line in f:
    for sampleword in line.split():
        print sampleword.upper()

Non-latin characters are not capitilized. What's the problem?

1

There are 1 answers

6
Łukasz Rogalski On

Solution for capitalizing non-latin letters in Python 2 is to use unicode strings:

words = [u'łuk', u'ćma']
assert [w.upper() for w in words] == [u'ŁUK', u'ĆMA']

To read unicode from file you may refer to official Python manual:

Reading Unicode from a file is therefore simple:

import codecs
f = codecs.open('unicode.rst', encoding='utf-8')
for line in f:
    print repr(line)