Bug/Error from Blaze Query

153 views Asked by At

I am trying to use the python module blaze. when i use it on small datasets it works. when i move to larger, more complex datasets i am getting errors. I include an example below. Given the error, it seems that blaze is having trouble turning the first column into a date. How do I specify the dtype of a specific column as string so Blaze doesn't try to parse. Thanks.

In [2]:
from pandas import *
from pylab import *
import pandas as pd
import pylab as plt
import numpy as np
import csv
import statsmodels.api as sm
import matplotlib
%matplotlib inline
import timeit
import blaze as bz
from blaze import *
bz.__version__
Out[2]:
'0.6.5'

In [3]:
t = Table('C:/Users/CRSP 1991 Current.csv')

In [4]:
t.columns
Out[4]:
[u'PERMNO',
 u'date',
 u'SICCD',
 u'PERMCO',
 u'PRC',
 u'RET',
 u'SHROUT',
 u'vwretd',
 u'ewretd']

In [5]:
t
C:\Users\Anaconda\lib\site-packages\IPython\core\formatters.py:239: FormatterWarning: Exception in text/html formatter: Unable to parse "12/31/1991" as a date
  FormatterWarning,
Out[5]:
<repr(<blaze.api.table.Table at 0x186bd3c8>) failed: ValueError: Unable to parse "12/31/1991" as a date>

In [6]:
t_smaller = t.PERMNO
t_smaller
Out[6]:
PERMNO
0   10001
1   10001
2   10001
3   10001
4   10001
5   10001
6   10001
7   10001
8   10001
9   10001
10  10001

In [7]:
t_smaller_10001 = t_smaller[t_smaller == 10001]
t_smaller_10001

Out[7]:
<repr(<blaze.expr.table.Column at 0x18819048>) failed: ValueError: Unable to parse "12/31/1991" as a date>
1

There are 1 answers

6
MRocklin On

I believe that this is handled in more recent versions. Try updating Blaze via conda

conda install blaze -c blaze

The main anaconda channel is updated relatively infrequently. The blaze channel (this is the -c blaze part) is updated on a weekly basis.