Printing non-matching (unique) lines in a file

121 views Asked by At

I'm trying to create a function which opens a file (filename), prints each line of the text which differs from the previous line (with the first line always written). Each output line should be prefixed with its line number in the input file.

I've come up with the following, which consistently prints the last line of the text regardless of whether or not it is a duplicate line:

def squeeze(filename):
    file = open(filename, 'r')
    prevline = ''
    line_num = 0
    for line in file:
        line_num = line_num + 1
        if line != prevline:
               print ('%3d - %s'%(line_num, line))  
        prevline = line

filename = 'Test.txt'
squeeze(filename)

I can't seem to figure out where and what the flaw in my code is to fix this?

Thank you, all very helpful, used ticked one!

4

There are 4 answers

0
Puffin GDI On BEST ANSWER

Each line should be terminated in a newline character \n or \r\n. So your final line doesn't have it.

You can use str.strip() to remove it.

with open(filename, 'r') as input_f:
    prevline = ''
    line_num = 0

    for line in input_f:
        line_num += 1
        if line.strip() != prevline.strip():     # use strip()
            print('%3d - %s' % (line_num, line))

        prevline = line
2
ChrisProsser On

The difference between the last but one and last line is the new line character missing from the end of the last line. Here is one way you can get around this:

def squeeze(filename):
    file = open(filename, 'r')
    prevline = ''
    line_num = 0
    for line in file:
        line_num = line_num + 1
        trimmed_line = line.strip()
        if trimmed_line != prevline:
               print ('%3d - %s'%(line_num, trimmed_line))  
        prevline = trimmed_line

filename = 'Test.txt'

Note: strip() will remove all whitespace from the ends. If this is not what you want then consider using .replace('\n', '') instead.

0
Irshad Bhat On

Your code works fine as for the below file:

aajgs ajdgadyy
aajgs ajdgadyy
jagshdg ag
ajdgjga
adgha
adgha

output is:

>>> squeeze(filename)
  1 - aajgs ajdgadyy

  3 - jagshdg ag

  4 - ajdgjga

  5 - adgha

So I suggest two modifications in your for loop as:

for line in file:
        line = line.strip() # strip trailing and leading spaces
        if line == '': continue # Skip empty lines
        line_num = line_num + 1
        if line != prevline:
               print ('%3d - %s'%(line_num, line))  
        prevline = line
0
Michael On

Try using a list to store the line as you finish one loop then before printing in the next loop check if the line exists already in the list.