Issue with reading and combining multiple weekly csv files

104 views Asked by At

So I have several years of weekly CSV files that look like, with in the form YYmmdd:

file = 'C:\\rig-070103'

I am trying to read and combine them into one dataset, preferably over a range of dates. So far I have:

pieces = []


for date in range(100):
    path = 'C:\\rig-YYmmdd.csv' % date
    frame = pd.read_csv(path)
    #frame['Date']= date
    pieces.append(frame)
    dataset = pd.concat(pieces, ignore_index=True)
print(dataset)

But this is giving me the error: 
path = 'C:\\rig-YYmmdd.csv' % date
TypeError: not all arguments converted during string formatting

I know this has to do with how I am referencing each file, any suggestions? I would also like to create another column listing the date for each file loaded in, so 1 date repeated over all rows for each file. Any help on this is really appreciated!

Here is an example of the data:

Prov    Location    LSD Section Township    Range   Meridian ...
AB  00-00-006-29W4  0   0   6   29  4
AB  01-18-008-09W4  1   18  8   9   4
AB  05-10-008-10W4  5   10  8   10  4
AB  01-12-008-12W4  1   12  8   12  4
AB  09-23-008-26W4  9   23  8   26  4
AB  13-13-009-25W4  13  13  9   25  4
3

There are 3 answers

3
JD Long On

So there are a few things you're doing here. One is you need a date range where each element is a day. That day then needs to be formatted as yymmdd. Then you pull in the csv into a dataframe. Then add a coumn for the date. Then append that to a main dataframe. Here's an attempt:

import pandas as pd

myDateList = pd.date_range('2003-01-01', periods=100)

myBigDf = pd.DataFrame()

for date in myDateList:
    path = 'C:\\rig-' + date.strftime('%y%m%d') + '.csv' 
    print path # show you what you got
    piece = pd.read_csv(path)
    piece['fileDate'] = date.strftime('%y%m%d')
    myBigDf.append(piece, ignore_index=True)
2
BurningKarl On

As the traceback shows, it's not the right way to use the modulo operator on strings. You can use this tutorial (https://docs.python.org/2.4/lib/typesseq-strings.html) to learn how it is used.

To get a string in the format yymmdd of a given date you can use the datetime module (https://docs.python.org/2/library/datetime.html) and the "strftime" function of datetime.date like this:

from datetime import date 
d = date(2007, 1, 13)
filename = d.strftime("C://rig-%y%m%d.csv")
#filename will be "C://rig-070113.csv"

If you want an iteration over 100 consecutive days, you have to use datetime.timedelta:

from datetime import date, timedelta
one_day = timedelta(1)
act_day = date(2007, 1, 13)
for i in range(100):
    filename = act_day.strftime("C://rig-%y%m%d.csv")
    ...
    act_day += one_day
13
Paul H On

Couple of things going on here:

First, for date in range(100) will iterate through integers 0 through 99. No dates anywhere to be found.

Next, path = 'C:\\rig-YYmmdd.csv' % date isn't valid. Assuming you actually have a datetime object, you would do: path = 'C:\\rig-%s.csv' % date.strftime('%y%m%d')

Finally, you're writing/overwriting your dataframe with every iteration. That silly.

So you your code becomes

import pandas
pieces = []
columns = ['Prov', 'Objective', 'WellStatus', 'WellType']

for date in pandas.date_range('2012-01-01', periods=100):
    path = r'C:\rig-%s.csv' % date.strftime('%y%m%d')
    frame = pandas.read_csv(path, names=columns, encoding='utf-8')
    frame['date'] = date
    pieces.append(frame)

dataset = pandas.concat(pieces, ignore_index=True)
print(dataset)