StringIO example does not work

1.1k views Asked by At

I try to understand how works numpy.getfromtxt method and io.StringIO. On the officical website(https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.genfromtxt.html#numpy.genfromtxt) I found some examples. Here is one of them:

s = StringIO("1,1.3,abcde")
data = np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','f8'),('mystring','S5')], delimiter=",")

But when I run this code on my computer I get: TypeError: must be str or None, not bytes

Tell me please how to fix it?

2

There are 2 answers

2
hpaulj On BEST ANSWER
In [200]: np.__version__
Out[200]: '1.14.0'

The example works for me:

In [201]: s = io.StringIO("1,1.3,abcde")
In [202]: np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','f8'),
     ...: ... ('mystring','S5')], delimiter=",")
Out[202]: 
array((1, 1.3, b'abcde'),
      dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', 'S5')])

It also works for a byte string:

In [204]: s = io.BytesIO(b"1,1.3,abcde")
In [205]: np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','f8'),
     ...: ... ('mystring','S5')], delimiter=",")
Out[205]: 
array((1, 1.3, b'abcde'),
      dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', 'S5')])

genfromtxt works with anything that feeds it lines, so I usually use a list of bytestrings directly (when testing questions):

In [206]: s = [b"1,1.3,abcde"]
In [207]: np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','f8'),
     ...: ... ('mystring','S5')], delimiter=",")
Out[207]: 
array((1, 1.3, b'abcde'),
      dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', 'S5')])

Or with several lines

In [208]: s = b"""1,1.3,abcde
     ...: 4,1.3,two""".splitlines()
In [209]: s
Out[209]: [b'1,1.3,abcde', b'4,1.3,two']
In [210]: np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','f8'),
     ...: ... ('mystring','S5')], delimiter=",")
Out[210]: 
array([(1, 1.3, b'abcde'), (4, 1.3, b'two')],
      dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', 'S5')])

It used to be that with dtype=None, genfromtxt created S strings.

NumPy dtype issues in genfromtxt(), reads string in as bytestring

With 1.14, we can control the default string dtype:

In [219]: s = io.StringIO("1,1.3,abcde")
In [220]: np.genfromtxt(s, dtype=None, delimiter=",")
/usr/local/bin/ipython3:1: VisibleDeprecationWarning: Reading unicode strings without specifying the encoding argument is deprecated. Set the encoding, use None for the system default.
  #!/usr/bin/python3
Out[220]: 
array((1, 1.3, b'abcde'),
      dtype=[('f0', '<i4'), ('f1', '<f8'), ('f2', 'S5')])
In [221]: s = io.StringIO("1,1.3,abcde")
In [222]: np.genfromtxt(s, dtype=None, delimiter=",",encoding=None)
Out[222]: 
array((1, 1.3, 'abcde'),
      dtype=[('f0', '<i4'), ('f1', '<f8'), ('f2', '<U5')])

https://docs.scipy.org/doc/numpy/release.html#encoding-argument-for-text-io-functions

Now I can generate examples with Py3 strings without producing all those ugly b'string' results (but got to remember that not everyone has upgraded to 1.14):

In [223]: s = """1,1.3,abcde
     ...: 4,1.3,two""".splitlines()
In [224]: np.genfromtxt(s, dtype=None, delimiter=",",encoding=None)
Out[224]: 
array([(1, 1.3, 'abcde'), (4, 1.3, 'two')],
      dtype=[('f0', '<i4'), ('f1', '<f8'), ('f2', '<U5')])
4
wim On

Consider upgrading numpy because for the current version of numpy, your code just works as written. See the mention in 1.14.0 release note highlights and the section Encoding argument for text IO functions for the relevant changes in np.genfromtxt.

For older numpy, you use a string object for the input but the docs you linked say:

Note that generators must return byte strings in Python 3k. 

So do what the docs say and give it a byte string:

import io
s = io.BytesIO(b"1,1.3,abcde")