I'm trying to write a code to copy all the duplicates of a file to a new file. The program I wrote checks the first 3 elements of each line and compares it to the next line.
f=open(r'C:\Users\xamer\Desktop\file.txt','r')
data=f.readlines()
f.close()
lines=data.copy()
dup=open(r'C:\Users\xamer\Desktop\duplicate.txt','a')
for x in data:
for y in data:
if (y[0]==x[0]) and (y[1]==x[1]) and (y[2]==x[2]):
lines.append(y)
else:
lines.remove(y)
dup.write(lines)
dup.close()
I'm getting the following error:
Traceback (most recent call last):
File "C:\Users\xamer\Desktop\file.py", line 80, in <module>
lines.remove(y)
ValueError: list.remove(x): x not in list
Any suggestions?
These snippets should do the job you were asking for. At the beginning I thought to create a
duplicated_lineslist and then writing it all at the end. But then I realized that I could optimize the code performance avoiding an additional final loop by just writing the repeated items on the flyAs underlined by another user it is not really clear if you want to check only adjacent double entries or repeated items independently from the position
In the first case - where repetitions are immediately after - this is the code:
If instead you wanna check repeated lines all along the file this is the code: