Join Multiple Files Dictionary

252 views Asked by At

I have a master table that contains some fields. I want to join it with a bunch of other csvs.

Current data looks like:

File 1:

Key  Attrib1  Attrib2  Attrib3  Attrib4

File 2:

Key Attrib5

File 3:

Key Attrib6

I want my final output to look like:

Key   Attrib1  Attrib2  Attrib3  Attrib4 Attrib5 Attrib6, etc.

Not all the files contain the all of the keys.

Current Code:

master = "in.csv"
file1 = "file.csv"
file2 = "file2.csv"
prime = list()
D1 = {}

with open(master) as f:
    for k in csv.reader(f):
        prime.append(k[0])

for k in prime:
    with open(file1,'r') as csvfile:
        rd = csv.reader(csvfile,delimiter=",")
        for row in rd:
            if row[0] ==k:
                D1 = dict((row[0],row[1]) for rows in rd)
    with open(file2,'r') as csvfile:
        rd = csv.reader(csvfile,delimiter=",")
        for row in rd:
            if row[0] ==k:
                D1 = D1+dict((row[0],row[1]) for rows in rd)
2

There are 2 answers

0
martineau On BEST ANSWER

I think this does close if not exactly what you want:

master = "in.csv"
filelist = "file.csv", "file2.csv"
joined = "joined.csv"
dict1 = {}

with open(master, 'r') as csvfile:
    for row in csv.reader(csvfile):
        key = row[0]
        dict1[key] = row[1:]  # note this does not check for duplicate keys

for filename in filelist:
    with open(filename, 'rb') as csvfile:
        seen = set()
        for row in csv.reader(csvfile):
            key = row[0]
            if key in dict1:
                if key in seen:
                    print('Error: duplicate key %r in file %r - ignored' %
                                   (key, filename))
                else:
                    dict1[key].append(row[1])
                    seen.add(key)
            else:  # key not in master
                pass  # ignore    

        # add null entry for any keys not present in this file
        for key in dict1:
            if key not in seen:
                dict1[key].append(None)

# write the data in the merged dictionary into a new csv file
with open(joined, 'wb') as newcsvfile:
    csv.writer(newcsvfile).writerows(
        ([key]+attrlist) for key, attrlist in sorted(dict1.iteritems()))
0
Christian W. On

The idea here is to open all three files and write them into a new .csv file. The general idea how I would go about joining csv files would be something like this:

import glob
import csv

# gets all the files in your dictionary that end with .csv
csv_files = glob.glob('*.csv')

        # create the new csv file, which will be your output
        with open('filename.csv', 'w') as outfile:
                writer = csv.writer(outfile, delimiter = ',')

                for csv_file in csv_files:
                    with open(csv_file) as infile:
                        reader = csv.reader(infile, delimiter = ',')
                        for row in reader:
                            writer.writerow(row)

You have to manipulate what exactly "row" consists of to make it match with how your data works (create empty columns on the data that don't have the columns you need).

Possible solution is to create a tuple format for each file where you create empty spots for the spots you need to have in them. Writing tuples to the row would work like this.

for row in reader:

    if csv_file == 'file1':
        # '' represents a blank field in column
        data_to_write = (row[0], row[1], '', row[2])

    elif csv_file == 'file2':
        data_to_write = '', row[0], row[1],row[2]

    writer.writerow(data_to_write)