importing from file vs internet in genfromtxt numpy date “TypeError: must be str, not bytes” using python 3.6

212 views Asked by At

Strange enough! but, applying np.genfromtxt() function on the file(goog.csv), wherein the data has been downloaded and stored from a source, produces no error.Following is the code->

import numpy as np
from matplotlib.dates import bytespdate2num

names = ["A", "B", "C", "D", "E", "F", "G"]
my_array1 = np.genfromtxt("goog.csv",                     
                          delimiter=',',
                          skip_header=1,
                          names=names,
                          dtype=None,
                          converters={0: bytespdate2num('%Y-%m-%d')})
print(my_array1["A"])

Output->

[ 736536.  736535.  736534. ...,  730124.  730123.  730122.]

However, applying the same function on a list whose data has been fetched from the same source, being in the same format(.csv), produces the Typerror.Following is the code->

import numpy as np
import request
from matplotlib.dates import bytespdate2num

/*fetching the internet data and store it in a list called stock_data*/

source_code = str(requests.get(goog_url, verify=True, auth=('user', 'pass')).content)
stock_data = []
split_source = source_code.split('\\n')
for line in split_source:
    stock_data.append(line)


names = ["A", "B", "C", "D", "E", "F", "G"]
my_array2 = np.genfromtxt(stock_data,
                          delimiter=',',
                          skip_header=1,
                          names=names,
                          dtype=None,
                          converters={0: bytespdate2num('%Y-%m-%d')})
print(my_array2["A"])

Output->

TypeError: must be str or None, not bytes

Data in the link goog_url as well as the file (goog.csv) is of the following format->

2017-07-26,153.3500,153.9300,153.0600,153.5000,153.5000,12778195.00

could find no reason for the difference and error in the second case.

2

There are 2 answers

1
hpaulj On

Using decode like this assumes x is bytestring:

In [127]: datefunc = lambda x: datetime.strptime(x.decode("utf-8"), '%Y-%m-%d') 
In [128]: datefunc('1999-01-30')
 ....
AttributeError: 'str' object has no attribute 'decode'
In [129]: datefunc(b'1999-01-30')

Without the decode it handles the default PY3 string type:

In [130]: datefunc1 = lambda x: datetime.strptime(x, '%Y-%m-%d') 

In [132]: datefunc1('1999-01-30')
Out[132]: datetime.datetime(1999, 1, 30, 0, 0)

Previously genfromtxt opened the file in bytestring mode, and thus required this kind of conversion. But in the current version, it can open the file in unicode, and shouldn't need the decode. If your version of genfromtxt accepts an encoding parameter (it may be even raise a warning about it), it's new.

1
Rakesh On

This works for me.

Demo:

import numpy as np
import datetime

datefunc = lambda x: datetime.datetime.strptime(x.decode("utf-8"), '%Y-%m-%d')
Date, Open, High, Low, Close, Adjusted_close, Volume = np.genfromtxt(filename, dtype=None, unpack=True, delimiter=',', converters = {0: datefunc}).tolist()
print(Date)

Output:

2017-07-26 00:00:00