Printing the contents and index location of one file by matching it with other file using python

74 views Asked by At

I'm new to python What I want is to be able to print content of a file I have like this..

Mashed Potatoes , topped with this and that ...................... 9.99$

similarly

Product_name , description ......................... price

when I match it with a file containing only Product_names

Mashed Potatoes

Past

Caesar Salad

etc. etc.

The content of the first file are not in a uniform order so that's why I'm trying it with search ,match and print approach

I hope you understand my problem

This is what I have tried

     import re

      content_file = open('/Users/ashishyadav/Downloads/pdfminer-20110515/samples/te.txt',"r")
      product_list = open('/Users/ashishyadav/Desktop/AQ/te.txt',"r")
      output = open("output.txt" , "w")
      line = content_file.read().lower().strip()
      for prod in product_list:
        for match in re.finditer(prod.lower().strip(), line):
         s=match.start()
         e=match.end()
         print >>output, match.group(),"\t",
         print >>output, '%d:%d' % ( s, e),"\n",

what my code does is it matches the second product list file with the full content file but gives me just the index of the product_Names not the description and price ..

what I want is an index/span from Product_name to price..

like from mashed potatoes ---- 9.99$( mashed potatoes - [0:58]),,m just getting [0:14]

and also any way to print the description and price using the same approach

Thanks in advance

1

There are 1 answers

4
georg On BEST ANSWER
  • Read the whole "second file" into a set X.
  • Read the "first" file line by line.
  • For each line, extract the part before the comma.
  • If this part is in the set X, print whatever is desired.

Let me know if you need this in python.

# Read the whole "second file" into a set X.
with open('foo') as fp:
    names = set(fp)

# Read the "first" file line by line.
with open('bar') as fp:
    for line in fp:

        # For each line, extract the part before the comma.
        name = line.split(',')[0]

        # If this part is in the set X, print whatever is desired.
        if name in names:
             print line