I am trying to use the python module blaze. when i use it on small datasets it works. when i move to larger, more complex datasets i am getting errors. I include an example below. Given the error, it seems that blaze is having trouble turning the first column into a date. How do I specify the dtype of a specific column as string so Blaze doesn't try to parse. Thanks.
In [2]:
from pandas import *
from pylab import *
import pandas as pd
import pylab as plt
import numpy as np
import csv
import statsmodels.api as sm
import matplotlib
%matplotlib inline
import timeit
import blaze as bz
from blaze import *
bz.__version__
Out[2]:
'0.6.5'
In [3]:
t = Table('C:/Users/CRSP 1991 Current.csv')
In [4]:
t.columns
Out[4]:
[u'PERMNO',
u'date',
u'SICCD',
u'PERMCO',
u'PRC',
u'RET',
u'SHROUT',
u'vwretd',
u'ewretd']
In [5]:
t
C:\Users\Anaconda\lib\site-packages\IPython\core\formatters.py:239: FormatterWarning: Exception in text/html formatter: Unable to parse "12/31/1991" as a date
FormatterWarning,
Out[5]:
<repr(<blaze.api.table.Table at 0x186bd3c8>) failed: ValueError: Unable to parse "12/31/1991" as a date>
In [6]:
t_smaller = t.PERMNO
t_smaller
Out[6]:
PERMNO
0 10001
1 10001
2 10001
3 10001
4 10001
5 10001
6 10001
7 10001
8 10001
9 10001
10 10001
In [7]:
t_smaller_10001 = t_smaller[t_smaller == 10001]
t_smaller_10001
Out[7]:
<repr(<blaze.expr.table.Column at 0x18819048>) failed: ValueError: Unable to parse "12/31/1991" as a date>
I believe that this is handled in more recent versions. Try updating Blaze via conda
The main anaconda channel is updated relatively infrequently. The blaze channel (this is the
-c blaze
part) is updated on a weekly basis.