csv parsing and manipulation using python

Question

csv parsing and manipulation using python

133 views Asked by SanjChau At 09 June 2015 at 14:40

I have a csv file which i need to parse using python.

triggerid,timestamp,hw0,hw1,hw2,hw3
1,234,343,434,78,56
2,454,22,90,44,76

I need to read the file line by line, slice the triggerid,timestamp and hw3 columns from these. But the column-sequence may change from run to run. So i need to match the field name, count the column and then print out the output file as :

triggerid,timestamp,hw3
1,234,56
2,454,76

Also, is there a way to generate an hash-table(like we have in perl) such that i can store the entire column for hw0 (hw0 as key and the values in the columns as values) for other modifications.

Original Q&A

There are 3 answers

SanjChau On 10 June 2015 at 05:53

I used a different approach (using.index function)

bpt_mode = ["bpt_mode_64","bpt_mode_128"] 
with open('StripValues.csv') as file: 
for _ in xrange(1): 
next(file)
 for line in file:
 stat_values = line.split(",") 
draw_id=stats.index('trigger_id')
 print stat_values[stats.index('trigger_id')],',',
 for j in range(len(bpt_mode)): 
print stat_values[stats.index('hw.gpu.s0.ss0.dg.'+bpt_mode[j])],',', file.close()

@holdenweb Though i am unable to figure out how to print the output to a file. Currently i am redirecting while running the script Can you provide a solution for writing to a file. There will be multiple writes to a single file.

mechanical_meat On 09 June 2015 at 15:39

Since you already have a solution for the slices here's something for the hash table part of the question:

import csv
with open('/path/to/file.csv','rb') as fin:
    ht = {}
    cr = csv.reader(fin)
    k = cr.next()[2]
    ht[k] = list()
    for line in cr:
        ht[k].append(line[2])

**holdenweb** · Accepted Answer · 2015-06-09T16:33:01+00:00

I'm unsure what you mean by "count the column".

An easy way to read the data in would use pandas, which was designed for just this sort of manipulation. This creates a pandas DataFrame from your data using the first row as titles.

In [374]: import pandas as pd
In [375]: d = pd.read_csv("30735293.csv")

In [376]: d
Out[376]:
   triggerid  timestamp  hw0  hw1  hw2  hw3
0          1        234  343  434   78   56
1          2        454   22   90   44   76

You can select one of the columns using a single column name, and multiple columns using a list of names:

In [377]: d[["triggerid", "timestamp", "hw3"]]
Out[377]:
   triggerid  timestamp  hw3
0          1        234   56
1          2        454   76

You can also adjust the indexing so that one or more of the data columns are used as index values:

In [378]: d1 = d.set_index("hw0"); d1
Out[378]:
     triggerid  timestamp  hw1  hw2  hw3
hw0
343          1        234  434   78   56
22           2        454   90   44   76

Using the .loc attribute you can retrieve a series for any indexed row:

In [390]: d1.loc[343]
Out[390]:
triggerid      1
timestamp    234
hw1          434
hw2           78
hw3           56
Name: 343, dtype: int64

You can use the column names to retrieve the individual column values from that one-row series:

In [393]: d1.loc[343]["triggerid"]
Out[393]: 1

TechQA.

csv parsing and manipulation using python

There are 3 answers

Related Questions in PYTHON-2.7

Related Questions in CSV

Related Questions in PANDAS

Related Questions in HASHTABLE

Popular Questions

Popular Tags

Trending Questions