I have a CSV file I'm trying to read in using DictReader.
But doing just this:
with("BeerRatings.csv", "r", "utf-8") as f:
reader = csv.DictReader(f)
for line in reader:
print line
gives me some ugly unicode as such:
{'Rating': '4', 'Brewery': 'Tr\xc3\xb6egs Brewing Company', 'Beer name': 'Tr\xc3\xb6egs Hopback Amber Ale'}
{'Rating': '4.59', 'Brewery': 'Brasserie Dieu Du Ciel', 'Beer name': 'P\xc3\xa9ch\xc3\xa9 Mortel - Bourbon Barrel Aged'} etc.
So, reading on stackoverflow, I editted my code to this, using the codecs module:
import codecs
with codecs.open("BeerRatings.csv", "r", "utf-8") as f:
reader = csv.DictReader(f)
for line in reader:
print line
But this is giving me a UnicodeEncodeError: 'ascii' codec can't encode character u'\xea' in position 9: ordinal not in range(128)
.
Any tips on how to go fix this?
UPDATE aka more flailing around:
def UnicodeDictReader(utf8_data, **kwargs):
csv_reader = csv.DictReader(utf8_data, **kwargs)
for row in csv_reader:
yield {key: unicode(value, 'utf-8') for key, value in row.iteritems()}
with open("BeerRatings.csv", "r") as f:
reader = UnicodeDictReader(f)
for line in reader:
print line
THis still gives me a less than ideal output...
{'Rating': u'4', 'Brewery': u'Tr\xf6egs Brewing Company', 'Beer name': u'Tr\xf6egs Hopback Amber Ale'}
{'Rating': u'4.59', 'Brewery': u'Brasserie Dieu Du Ciel', 'Beer name': u'P\xe9ch\xe9 Mortel - Bourbon Barrel Aged'}
The
csv
module in Python 2.X expects the input file to be opened in binary, and does not support encodings. It is, however, compatible with UTF-8, but you have to decode to Unicode yourself:Output:
Edit
Per your
UnicodeDictReader
, you still need to print the key/value pairs as I did or you get the default printing for adict
, which shows escaped data via therepr()
of the string. Also open in binary mode. It matters on some OSes, particularly Windows.Output: