how to compare two text files which has the same keys and then write a new one which has both the same keys and the one that does not in python

260 views Asked by At

Input files:

file1.txt

danial,23,janitor
adam,42,waiter
katherine,21,teacher

file2.txt

danial,5,broadway street
brooke,4,hughway street
adam,3,new street

Desired output:

danial,23,janitor,5,broadway street
adam,42,waiter,3,new street
katherine,21,teacher
brooke,4,hughway street

My current code:

with open('C:\\Users\\user\\Desktop\\Dap\\job.txt') as f1, open('C:\\Users\\user\\Desktop\\Dap\\address.txt') as f2:
    job = {}
    for line in f1:
        name, age, job = line.split(',')
        address[name] = age, job

    address = {}
    for line in f2:
        name2, num, address = line.split(',')
        course[name2] = num, address

    common = set(job.keys() & set(address.keys()))
    with open('C:\\Users\\Izz\\Desktop\\Data\\output.txt', 'w') as f:
        for i in common:
            f.write("%s\t%s\t%s\n" % (i, job[i], address[i]))

Edit:

With this code here I managed to only print the one with similar keys. I managed to do a dictionary method where I assign the first column as key but I can only print the one with similar keys.

1

There are 1 answers

2
eugenhu On BEST ANSWER

This seems to do what you want:

from collections import defaultdict
import itertools


with open('file1.txt') as f1, open('file2.txt') as f2, open('out.txt', 'w') as out:
    tmp = defaultdict(list)

    for l in itertools.chain(f1, f2):
        l = l.strip()

        if not l:
            continue

        name, a, b = l.split(',')
        tmp[name] += (a, b)

    out.writelines((','.join((k, *v)) + '\n' for k, v in tmp.items()))

Description:

We create a tmp defaultdict first to store the various attributes (age, occupation, ...) that each person might have. The defaultdict creates an empty list for us whenever we access a key for the first time, this allows us to do tmp[name] += (a, b) without having to first check if name already exists (and if not, create a new list), improving readability.

Have a look at the itertools.chain(l1, l2, ...) documentation for an explanation of that as the example provided there is pretty concise.

Iterating through f1 and f2 will yield each line of the file, including any newlines, so we have to first use l = l.strip() to strip those off before continuing further.

If your input file has blank lines, then if not l: continue is used to check if l is the blank string, '' (which evaluates to False), and if it is, skip it. We could have alternatively had:

if l:
    # do our stuff

However this is slightly worse form, prefer to write your code assuming everything goes as planned and introducing if statements to handle the exceptional cases instead will improve its readability.

We now split each line into their three components with l.split(',') and unpack the result into the variables name, a, b, assuming that the format of your input file will always be, the persons name, followed by two arbitrary attributes, delimited by commas. (If you're unsure on how tuple unpacking works, this seems to provide a good introduction to tuples in general (including unpacking)).

Since we can extend lists like so:

>>> v = [1, 2, 3]
>>> v += (4, 5)
>>> v
[1, 2, 3, 4, 5]

We then append our person's attribute a and b into tmp[name] by doing tmp[name] += (a, b).

The last step now that the tmp dictionary has been constructed with everyone's names and attributes, is to write it into our out file.

out.writelines((','.join((k, *v)) + '\n' for k, v in tmp.items()))

Here we use a list comprehension to format our output (if you're also unsure of this, have a look at the documentation linked), and if you're unfamiliar with the * operator, it is used here to unpack v (which is the list of attributes for person with name k), link to doc.

And then ','.join(lst) will combine the strings in lst (in this case (k, *v)) into one string, each value separated by ','.

Finally, we add on a newline onto the end since out.writelines(lines) doesn't include them for us, and we write our lines to the file with writelines().