Using StringIO with pandas.read_csv keyword arguments

2.5k views Asked by At

I am attempting to read in a csv file using pandas.read_csv. I am very confused, since the code works when one types in the csv manually.

from six.moves import cStringIO as StringIO

Companies="""
Top,       Equipment,  Users, Neither 
Company 1,       0,     0,  43
Company 2,       0,     0,  32
Company 3,       1,     3,  20
Company 4,       9,     3,  9
Company 5,       8,      7, 3
Company 6,       2,     7,  8
Company 7,       5,     2,  1
Company 8,       1,     4,  1
Company 9,       5,     1,  0
Company 10,      1,     1,  3
Company 11,      2,     2,  0
Company 12,      0,     1,  1
Company 13,      2,     0,  0
Company 14,      1,     0,  0
Company 15,      1,     0,  0
Company 16,      0,     1,  0
"""

Using:

df = pd.read_csv(StringIO(Companies),
                 skiprows=1,
                 skipinitialspace=True,
                 engine='python')

^^ The above works!

However, when I try to read the data from a separate csv,I keep getting errors.

I tried:

df = pd.read_csv(StringIO('MYDATA.csv', nrows=17, skiprows=1,skipinitialspace=True, delimiter=','))

and got the error TypeError: StringIO() takes no keyword arguments Originally I got the error TypeError: Must be Convertible to a buffer, not DataFrame, but I can't remember how I got rid of that error.

I looked up the StringIO documentation and other sites including: https://newcircle.com/bookshelf/python_fundamentals_tutorial/working_with_files but I'm stuck!

1

There are 1 answers

4
Martijn Pieters On BEST ANSWER

You closed the parentheses in the wrong location:

df = pd.read_csv(StringIO('MYDATA.csv', nrows=17, skiprows=1,skipinitialspace=True, delimiter=','))
#                        ^            ^ not closed here

You'd move the closing parenthesis to close the StringIO() call and leave the keyword arguments for the pd.read_csv() call:

df = pd.read_csv(StringIO('MYDATA.csv'), nrows=17, skiprows=1,skipinitialspace=True, delimiter=',')

Note that StringIO('MYDATA.csv') creates an in-memory file with the contents MYDATA.csv; it does not open a file with that filename. If you wanted to open a file on your filesystem named MYDATA.csv, you need to leave off the StringIO call:

df = pd.read_csv('MYDATA.csv', nrows=17, skiprows=1, skipinitialspace=True, delimiter=',')