Python: Trying to shift data in CSV but not seeing desired results

381 views Asked by At

I'm having trouble with a relatively simple problem.. I have some data like so in CSV:

period, reading
1, 3
2, 4
3, 5
4, 2
5, 2

I simply want the second column to shift up by one, while the first column stays the same. After the second column shifts up by one, I would like the last value to be replaced with a random value in the range -5, 5 (hopefully you can see this from my script).

import csv
import random


directory = r"C:/Program Files (x86)/CM/data.csv"

with open(directory, 'r') as csvfile:
    s = csvfile.readlines()

dataCSV = []

for i, point in enumerate(s[1:]):
    seperatedPoint = point.strip("\n").split(",")
    if len(seperatedPoint) == 2:
        dataCSV.append([int(dataPoint) for dataPoint in seperatedPoint])

    l = len(dataCSV)

    for i in range(l-1):
        dataCSV[i][1] = dataCSV[i+1][1]

dataCSV[l-1][1] += random.randint(-5,5)


with open(directory, 'w') as csvfile: #opens the file for writing
    output = csv.writer(csvfile, delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)  
    output.writerow(["period", "reading"]) 
    header = (["period", "reading"])
    print( ", ".join( str(e) for e in header ) )
    print ""
    for point in dataCSV: 
        output.writerow(point) 
        print( ", ".join( str(e) for e in point ) )
        print ""

However instead of shifting up by one, this is just spitting out a the same value a ton of times in the second column.. It will then change values and spit out a ton of those repeated values as well, until I get to the end of my range. I can't figure out what I'm doing wrong. Any help is appreciated, thank you.

3

There are 3 answers

0
Henry Keiter On

You're shifting all the data up every time you add a data point to the list. To see this, add a print line here:

for i in range(len(dataCSV)-1):
    print 'setting index {} to {}'.format(i, dataCSV[i+1][1])
    dataCSV[i][1] = dataCSV[i+1][1]

Which will print out the following on your sample input:

setting index 0 to 4
setting index 0 to 4
setting index 1 to 5
setting index 0 to 5
setting index 1 to 5
setting index 2 to 2
setting index 0 to 5
setting index 1 to 2
setting index 2 to 2
setting index 3 to 2

The large number of "setting..." lines is your first clue that something is wrong: there should only be four shift operations.

The culprit, in this case, is very simple: the loop that performs the shifting is indented too far. It should be at the same level as the loop that initializes dataCSV.

for point in s[1:]:
    seperatedPoint = point.strip("\n").split(",")
    if len(seperatedPoint) == 2:
        dataCSV.append([int(dataPoint) for dataPoint in seperatedPoint])

for i in range(len(dataCSV)-1):
    dataCSV[i][1] = dataCSV[i+1][1]

Mostly unrelated to the question: you could clean up this code and potentially avoid this kind of mishap by cleaning up your file reading. Rather than reading the whole file, and then manually creating lists and throwing each line into the list in an explicit loop, you'd be better off using the csv module to read the file as well as to write it:

with open(filename, 'r') as csvfile:
    reader = csv.reader(csvfile)
    next(reader) # skip the header line
    dataCSV = [[int(s) for s in row] for row in reader]
2
Hugh Bothwell On

This works for both Python 2 and 3, and streams the data rather than holding it all in memory.

There is a fair bit of boilerplate:

import csv
from itertools import chain, tee
import random
import sys

if sys.hexversion < 0x3000000:
    # Python 2.x
    from itertools import izip as _zip
    open_read  = lambda fname: open(fname, "rb")
    open_write = lambda fname: open(fname, "wb")
else:
    # Python 3.x
    _zip = zip
    open_read  = lambda fname: open(fname, "r", newline="")
    open_write = lambda fname: open(fname, "w", newline="")

def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = tee(iterable)
    next(b, None)
    return _zip(a, b)

def read_csv(fname, skip_header=True, **kwargs):
    with open_read(fname) as inf:
        incsv = csv.reader(inf, **kwargs)
        if skip_header:
            next(incsv, None)
        for row in incsv:
            yield row

def write_csv(fname, rows, header=None, **kwargs):
    with open_write(fname) as outf:
        outcsv = csv.writer(outf, **kwargs)
        if header:
            outcsv.writerow(header)
        outcsv.writerows(rows)

but one you have that, the solution is simply:

INPUT = "C:/Program Files (x86)/CM/data.csv"
OUTPUT = "C:/Program Files (x86)/CM/data2.csv"

def main():
    tail = ['', random.randint(-5, 5)]
    data = chain(read_csv(INPUT), [tail])
    processed = ([row1[0], row2[1]] for row1, row2 in pairwise(data))
    write_csv(OUTPUT, processed, header=["period", "reading"])

if __name__=="__main__":
    main()
0
acushner On
import pandas as pd
import random

random.seed(time.time())
df = pd.read_csv(filename)
df[col] = df[col].shift(-1)
df.iloc[-1][col] = random.randint(-5, 5)
df.to_csv(outfile)