Trouble reading in Unicode strings from CSV file to DictReader in Python

Question

Trouble reading in Unicode strings from CSV file to DictReader in Python

1.4k views Asked by SpicyClubSauce At 05 December 2024 at 17:08

I have a CSV file I'm trying to read in using DictReader.

But doing just this:

with("BeerRatings.csv", "r", "utf-8") as f:
    reader = csv.DictReader(f)
    for line in reader:
        print line

gives me some ugly unicode as such:

{'Rating': '4', 'Brewery': 'Tr\xc3\xb6egs Brewing Company', 'Beer name': 'Tr\xc3\xb6egs Hopback Amber Ale'}
{'Rating': '4.59', 'Brewery': 'Brasserie Dieu Du Ciel', 'Beer name': 'P\xc3\xa9ch\xc3\xa9 Mortel - Bourbon Barrel Aged'} etc.

So, reading on stackoverflow, I editted my code to this, using the codecs module:

import codecs

with codecs.open("BeerRatings.csv", "r", "utf-8") as f:
    reader = csv.DictReader(f)
    for line in reader:
        print line

But this is giving me a UnicodeEncodeError: 'ascii' codec can't encode character u'\xea' in position 9: ordinal not in range(128).

Any tips on how to go fix this?

UPDATE aka more flailing around:

def UnicodeDictReader(utf8_data, **kwargs):
    csv_reader = csv.DictReader(utf8_data, **kwargs)
    for row in csv_reader:
        yield {key: unicode(value, 'utf-8') for key, value in row.iteritems()}

with open("BeerRatings.csv", "r") as f:
    reader = UnicodeDictReader(f)
    for line in reader:
        print line

THis still gives me a less than ideal output...

{'Rating': u'4', 'Brewery': u'Tr\xf6egs Brewing Company', 'Beer name': u'Tr\xf6egs Hopback Amber Ale'}
{'Rating': u'4.59', 'Brewery': u'Brasserie Dieu Du Ciel', 'Beer name': u'P\xe9ch\xe9 Mortel - Bourbon Barrel Aged'}

Original Q&A

There are 1 answers

**Mark Tolonen** · Accepted Answer · 2015-06-15T20:54:13+00:00

The csv module in Python 2.X expects the input file to be opened in binary, and does not support encodings. It is, however, compatible with UTF-8, but you have to decode to Unicode yourself:

import csv

with open('BeerRatings.csv','rb') as f:
    reader = csv.DictReader(f)
    for line in reader:
        for k,v in line.iteritems():
            print k.decode('utf8'),':',v.decode('utf8')
        print

Output:

Rating : 4
Brewery : Tröegs Brewing Company
Beer name : Tröegs Hopback Amber Ale

Rating : 4.59
Brewery : Brasserie Dieu Du Ciel
Beer name : Péché Mortel - Bourbon Barrel Aged

Edit

Per your UnicodeDictReader, you still need to print the key/value pairs as I did or you get the default printing for a dict, which shows escaped data via the repr() of the string. Also open in binary mode. It matters on some OSes, particularly Windows.

import csv

def UnicodeDictReader(utf8_data, **kwargs):
    csv_reader = csv.DictReader(utf8_data, **kwargs)
    for row in csv_reader:
        yield {key.decode('utf8'):value.decode('utf8') for key, value in row.iteritems()}

def prettydict(D):
    return u'{' + u', '.join(u"'{}': '{}'".format(k,v) for k,v in D.iteritems()) + u'}'

with open("BeerRatings.csv", "rb") as f:
    reader = UnicodeDictReader(f)
    for line in reader:
        print prettydict(line)

Output:

{'Rating': '4', 'Brewery': 'Tröegs Brewing Company', 'Beer name': 'Tröegs Hopback Amber Ale'}
{'Rating': '4.59', 'Brewery': 'Brasserie Dieu Du Ciel', 'Beer name': 'Péché Mortel - Bourbon Barrel Aged'}

TechQA.

Trouble reading in Unicode strings from CSV file to DictReader in Python

There are 1 answers

Edit

Related Questions in PYTHON-2.7

Related Questions in CSV

Related Questions in DICTIONARY

Related Questions in UNICODE

Related Questions in CODEC

Popular Questions

Popular Tags

Trending Questions