How to merge two csv files with common column in python 2.5 version

521 views Asked by At

I am very new to this python programming. I have two csv files. I have to just merge them using the common column name. I have been trying it by looking at several other posts. But couldnt get that code running in my 2.5 version of python. So Could anyone help me regarding this. The files may look like this

File1
split_name, vcc, temp, c
A, 1,2,1
B,2,3,5

File 2
split_name, cout, i, vout
A, 2.5,2, 1
B, 2.4,1,8

Result file should be something like this

split_name,vcc,temp,c,cout,i,vout
A,         1,   2,  1, 2.5,2,1
B,         2,   3,  5, 2.4,1,8

The code that I was trying is :

import csv
import array
import os

#def readfile2(file2name):
r = csv.reader(open('file1.csv','r'))
dict2 = {row[0]: row[1:] for row in r}
print str(dict2)
#print dict2.keys()

#def readfile1(file1name):
reader1 = csv.reader(open('file2.csv','r'))
for row in reader1:
    dict1 = {row[0]: row[1:]}
    #print str(dict1)
    #print dict1.values()
    print str(dict1)



keys = set(dict1.keys() + dict2.keys())
with open('output.csv', 'wb') as f:
    w = csv.writer(f, delimiter=',')
    w.writerows([[key, dict1.get(key, "''"), dict2.get(key, "''")] for key in keys])

But the error that I have encountered is:

keys = set((dict1.keys()) + (dict2.keys())) TypeError: unsupported operand type(s) for +: 'dict_keys' and 'dict_keys'

Note: I have installed python 3.4 version now.

Your help will be greatly appreciated!

2

There are 2 answers

5
cxrodgers On

You can do this most easily using the join function from pandas. If you cannot install pandas, you can reimplement the csv-loading and joining functionality in pure python, but I think in the long run you're better off with pandas.

import pandas
df1 = pandas.read_table('file1.csv')
df2 = pandas.read_table('file2.csv')
joined = df1.join(df2)

You can play around with the parameters to read_table and join to get exactly the behavior you want. Assuming split_name is a unique identifier for each row in both files, you will probably want to use it as the "index" for both of the dataframes.

0
Lehych On

First of all it's better to stick with particular version of python.

You get this error TypeError: unsupported operand type(s) for +: 'dict_keys' and 'dict_keys' because Python 3 return dictionary view for key() method unlike Python 2.5.

You wan to get union of keys from two dicts. You can say it to python this way:

set(dict1.keys()) | set(dict2.keys())

where | operator is union for two sets.

To solve your task should rewrite last line so writerows() got list of strings as argument, not list of lists. I think it would be better for you to use for cycle instead of list comprehension.

Here is the code for Python 3 with some changes and comments:

import csv

reader1 = csv.reader(open('file1.csv','r'))
titles1 = next(reader1)  # First row is titles. So we parse it separetly
dict1 = {row[0]: row[1:] for row in reader1}

reader2 = csv.reader(open('file2.csv','r'))
titles2 = next(reader2)
dict2 = {}  # If we skipt this and do nto change behaviour in cilce we
            # will get only last row every time
for row in reader2:
    dict2[row[0]] = row[1:]

keys = set(dict1.keys()) | set(dict2.keys())
with open('output.csv', 'w', newline='') as f:  # For CVS it's beeter to use
                                                # text mode, not binary.
    w = csv.writer(f, delimiter=',')

    w.writerow(titles1 + titles2)
    for key in keys:
        w.writerow([key, ] +
                dict1.get(key, [''] * (len(titles1)-1)) +
                dict2.get(key, [''] * (len(titles2)-1))
                )