Getting UnicodeDecodeError when transposing DataFrame in iPython

238 views Asked by At

I am importing excel table from http://www.gapminder.org/data/ Then I am want to switch columns and rows of the table. And that is the error I am getting: "UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 23389: ordinal not in range(128)"

I was trying to encode/decode DataFrame with DataFrame.decode('utf-8') but it says that DataFrame does not have such attribute.

The error occurs because transpose cannot convert some data into ascii. Is that right? But why do we need to do it when my table is pure numbers?

Thank you so much.

there is more information on the error:

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-190-a252f2a45657> in <module>()
      1 #your code here
      2 countries = countries.transpose()
----> 3 income.transpose()
      4 #income = income.decode('utf-8')
      5 #content = content.decode('utf-8')

/Users/Sergey/anaconda/lib/python2.7/site-packages/IPython/core/displayhook.pyc in __call__(self, result)
    236                 self.write_format_data(format_dict, md_dict)
    237                 self.log_output(format_dict)
--> 238             self.finish_displayhook()
    239 
    240     def cull_cache(self):

/Users/Sergey/anaconda/lib/python2.7/site-packages/IPython/kernel/zmq/displayhook.pyc in finish_displayhook(self)
     70         sys.stderr.flush()
     71         if self.msg['content']['data']:
---> 72             self.session.send(self.pub_socket, self.msg, ident=self.topic)
     73         self.msg = None
     74 

/Users/Sergey/anaconda/lib/python2.7/site-packages/IPython/kernel/zmq/session.pyc in send(self, stream, msg_or_type, content, parent, ident, buffers, track, header, metadata)
    647         if self.adapt_version:
    648             msg = adapt(msg, self.adapt_version)
--> 649         to_send = self.serialize(msg, ident)
    650         to_send.extend(buffers)
    651         longest = max([ len(s) for s in to_send ])

/Users/Sergey/anaconda/lib/python2.7/site-packages/IPython/kernel/zmq/session.pyc in serialize(self, msg, ident)
    551             content = self.none
    552         elif isinstance(content, dict):
--> 553             content = self.pack(content)
    554         elif isinstance(content, bytes):
    555             # content is already packed, as in a relayed message

/Users/Sergey/anaconda/lib/python2.7/site-packages/IPython/kernel/zmq/session.pyc in <lambda>(obj)
     83 # disallow nan, because it's not actually valid JSON
     84 json_packer = lambda obj: jsonapi.dumps(obj, default=date_default,
---> 85     ensure_ascii=False, allow_nan=False,
     86 )
     87 json_unpacker = lambda s: jsonapi.loads(s)

/Users/Sergey/anaconda/lib/python2.7/site-packages/zmq/utils/jsonapi.pyc in dumps(o, **kwargs)
     38         kwargs['separators'] = (',', ':')
     39 
---> 40     s = jsonmod.dumps(o, **kwargs)
     41 
     42     if isinstance(s, unicode):

/Users/Sergey/anaconda/lib/python2.7/json/__init__.pyc in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, encoding, default, sort_keys, **kw)
    248         check_circular=check_circular, allow_nan=allow_nan, indent=indent,
    249         separators=separators, encoding=encoding, default=default,
--> 250         sort_keys=sort_keys, **kw).encode(obj)
    251 
    252 

/Users/Sergey/anaconda/lib/python2.7/json/encoder.pyc in encode(self, o)
    208         if not isinstance(chunks, (list, tuple)):
    209             chunks = list(chunks)
--> 210         return ''.join(chunks)
    211 
    212     def iterencode(self, o, _one_shot=False):

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 23389: ordinal not in range(128)

After I spent 40min on it, the problem solved itself and I have no idea how. The only thing I did is I added this code:

#encoding=utf8 
import sys
reload(sys)  
sys.setdefaultencoding('utf8')

But then when I delete this peace, it is still working. Does anybody know why? Thank you!!!!

1

There are 1 answers

2
tegancp On

It's a bit difficult to answer authoritatively without more specific knowledge of your ipython session, but here are some educated guesses:

As to why you get that error, even though your data is all numeric, it is most likely from one of the index labels (which presumably contain some text).

As for why it still works when you delete that code, if you are working in iPython notebook, then once you run the code setting the default encoding to utf-8, that setting stays in effect until either

  • some other code is run that changes the setting, or
  • you restart the python kernel