csv parsing and manipulation using python

97 views Asked by At

I have a csv file which i need to parse using python.

triggerid,timestamp,hw0,hw1,hw2,hw3
1,234,343,434,78,56
2,454,22,90,44,76

I need to read the file line by line, slice the triggerid,timestamp and hw3 columns from these. But the column-sequence may change from run to run. So i need to match the field name, count the column and then print out the output file as :

triggerid,timestamp,hw3
1,234,56
2,454,76

Also, is there a way to generate an hash-table(like we have in perl) such that i can store the entire column for hw0 (hw0 as key and the values in the columns as values) for other modifications.

3

There are 3 answers

0
holdenweb On BEST ANSWER

I'm unsure what you mean by "count the column".

An easy way to read the data in would use pandas, which was designed for just this sort of manipulation. This creates a pandas DataFrame from your data using the first row as titles.

In [374]: import pandas as pd
In [375]: d = pd.read_csv("30735293.csv")

In [376]: d
Out[376]:
   triggerid  timestamp  hw0  hw1  hw2  hw3
0          1        234  343  434   78   56
1          2        454   22   90   44   76

You can select one of the columns using a single column name, and multiple columns using a list of names:

In [377]: d[["triggerid", "timestamp", "hw3"]]
Out[377]:
   triggerid  timestamp  hw3
0          1        234   56
1          2        454   76

You can also adjust the indexing so that one or more of the data columns are used as index values:

In [378]: d1 = d.set_index("hw0"); d1
Out[378]:
     triggerid  timestamp  hw1  hw2  hw3
hw0
343          1        234  434   78   56
22           2        454   90   44   76

Using the .loc attribute you can retrieve a series for any indexed row:

In [390]: d1.loc[343]
Out[390]:
triggerid      1
timestamp    234
hw1          434
hw2           78
hw3           56
Name: 343, dtype: int64

You can use the column names to retrieve the individual column values from that one-row series:

In [393]: d1.loc[343]["triggerid"]
Out[393]: 1
2
SanjChau On

I used a different approach (using.index function)

bpt_mode = ["bpt_mode_64","bpt_mode_128"] 
with open('StripValues.csv') as file: 
for _ in xrange(1): 
next(file)
 for line in file:
 stat_values = line.split(",") 
draw_id=stats.index('trigger_id')
 print stat_values[stats.index('trigger_id')],',',
 for j in range(len(bpt_mode)): 
print stat_values[stats.index('hw.gpu.s0.ss0.dg.'+bpt_mode[j])],',', file.close()

@holdenweb Though i am unable to figure out how to print the output to a file. Currently i am redirecting while running the script Can you provide a solution for writing to a file. There will be multiple writes to a single file.

2
mechanical_meat On

Since you already have a solution for the slices here's something for the hash table part of the question:

import csv
with open('/path/to/file.csv','rb') as fin:
    ht = {}
    cr = csv.reader(fin)
    k = cr.next()[2]
    ht[k] = list()
    for line in cr:
        ht[k].append(line[2])