I am very new to this python programming. I have two csv files. I have to just merge them using the common column name. I have been trying it by looking at several other posts. But couldnt get that code running in my 2.5 version of python. So Could anyone help me regarding this. The files may look like this
File1
split_name, vcc, temp, c
A, 1,2,1
B,2,3,5
File 2
split_name, cout, i, vout
A, 2.5,2, 1
B, 2.4,1,8
Result file should be something like this
split_name,vcc,temp,c,cout,i,vout
A, 1, 2, 1, 2.5,2,1
B, 2, 3, 5, 2.4,1,8
The code that I was trying is :
import csv
import array
import os
#def readfile2(file2name):
r = csv.reader(open('file1.csv','r'))
dict2 = {row[0]: row[1:] for row in r}
print str(dict2)
#print dict2.keys()
#def readfile1(file1name):
reader1 = csv.reader(open('file2.csv','r'))
for row in reader1:
dict1 = {row[0]: row[1:]}
#print str(dict1)
#print dict1.values()
print str(dict1)
keys = set(dict1.keys() + dict2.keys())
with open('output.csv', 'wb') as f:
w = csv.writer(f, delimiter=',')
w.writerows([[key, dict1.get(key, "''"), dict2.get(key, "''")] for key in keys])
But the error that I have encountered is:
keys = set((dict1.keys()) + (dict2.keys())) TypeError: unsupported operand type(s) for +: 'dict_keys' and 'dict_keys'
Note: I have installed python 3.4 version now.
Your help will be greatly appreciated!
You can do this most easily using the join function from pandas. If you cannot install pandas, you can reimplement the csv-loading and joining functionality in pure python, but I think in the long run you're better off with pandas.
You can play around with the parameters to
read_table
andjoin
to get exactly the behavior you want. Assumingsplit_name
is a unique identifier for each row in both files, you will probably want to use it as the "index" for both of the dataframes.