parse a csv file into a text file

859 views Asked by At

I am a second year EE student. I just started learning python for my project.

I intend to parse a csv file with a format like

3520005,"Toronto (Ont.)",C ,F,2503281,2481494,F,F,0.9,1040597,979330,630.1763,3972.4,1
2466023,"Montréal (Que.)",V ,F,1620693,1583590,T,F,2.3,787060,743204,365.1303,4438.7,2
5915022,"Vancouver (B.C.)",CY ,F,578041,545671,F,F,5.9,273804,253212,114.7133,5039.0,8
3519038,"Richmond Hill (Ont.)",T ,F,162704,132030,F,F,23.2,53028,51000,100.8917,1612.7,28

into a text file like the following

Toronto 2503281 Montreal 1620693 Vancouver 578041

I am extracting the 1st and 5th column and save it into a text file.

This is what i have so far.

import csv
file = open('raw.csv')
reader = csv.reader(file)

f = open('NicelyDone.text','w')

for line in reader:
      f.write("%s %s"%line[1],%line[5])

This is not working for me, I was able to extract the data from the csv file as line[1],line[5]. (I am able to print it out) But I dont know how to write it to a .text file in the format i wanted.

Also, I have to process the first column eg, "Toronto (Ont.)" into "Toronto". I am familiar with the function find(), I assume that i could extract Toronto out of Toronto(Ont.) using "(" as the stopping character, but based on my research , I have no idea how to use it and ask it to return me the string(Toronto).

Here is my question:

  1. What is the data format for line[1]?
    • If it is string how come f.write() does not work?
    • If it is not string, how do i convert it to a string?
  2. How do i extract the word Toronto out of Toronto(Ont) into a string form using find() or other methods.

My thinking is that I could add those 2 string together like c = a+ ' ' + b, that would give me the format i wanted. So i can use f.write() to write into a file :)

Sorry if my questions sounds too easy or stupid.

Thanks ahead

Zhen

2

There are 2 answers

0
user1462309 On
  1. I don't recall csv that well, so I don't know if it's a string or not. What error are you getting? In any case, assuming it is a string, your line should be:

    f.write("%s %s " % (line[1], line[5]))
    

    In other words, you need a set of parentheses. Also, you should have a trailing space in your string.

  2. A somewhat hackish but concise way to do this is: line[1].split("(")[0]

    This will create a list that splits on the ( symbol, and then you extract the first element.

0
Steinar Lima On
  1. All data read you get from csv.reader are strings.
  2. There is a variety of solutions to this, but the simplest would be to split on ( and strip away any whitespace:

    >>> a = 'Toronto (Ont.)'
    >>> b = a.split('(')
    >>> b
    Out[16]: ['Toronto ', 'Ont.)']
    >>> c = b[0]
    >>> c
    Out[18]: 'Toronto '
    >>> c.strip()
    Out[19]: 'Toronto'
    

    or in one line:

    >>> print 'Toronto (Ont.)'.split('(')[0].strip()
    

    Another option would have been to use regular expression (the re module).

The specific problem in your code lies here:

f.write("%s %s"%line[1],%line[5])

Using the % syntax to format your string, you have to provide either a single value, or an iterable. In your case this should be:

f.write("%s %s" % (line[1], line[5]))

Another way to do the exact same thing, is to use the format method.

f.write('{} {}'.format(line[1], line[5]))

This is a flexible way of formating strings, and I recommend that you read about in the docs.


Regarding your code, there is a couple of things you should consider.

  • Always remember to close your file handlers. If you use with open(...) as fp, this is taken care of for you.

    with open('myfile.txt') as ifile:
        # Do stuff
    # The file is closed here
    
  • Don't use reserved words as your variable name. file is such a thing, and by using it as something else (shadowing it), you may cause problems later on in your code.

  • To write your data, you can use csv.writer:

    with open('myfile.txt', 'wb') as ofile:
        writer = csv.writer(ofile)
        writer.writerow(['my', 'data'])
    
  • From Python 2.6 and above, you can combine multiple with statements in one statement:

    with open('raw.csv') as ifile, open('NicelyDone.text','w') as ofile:
        reader = csv.reader(ifile)
        writer = csv.writer(ofile)
    

Combining this knowledge, your script can be rewritten to something like:

import csv

with open('raw.csv') as ifile, open('NicelyDone.text', 'wb') as ofile:
    reader = csv.reader(ifile)
    writer = csv.writer(ofile, delimiter=' ')
    for row in reader:
        city, num = row[1].split('(')[0].strip(), row[5]
        writer.writerow([city, num])