plagiarism detection using damerau levenshtein algorithm

2k views Asked by At

how will i simulate the damerau leveshtein distance algorithm so as to detect plagiarism in documents? thanks!

1

There are 1 answers

0
Samuel Neff On

Levenshtein distance is primarily used to compare two strings, such as comparing names or finding alternates in a spell checker. Using this algorithm for a whole document to detect plagiarism is not typical.

There is some work in the area though. Everything points to this article, which requires subscription:

Plagiarism Detection Using the Levenshtein Distance and Smith-Waterman Algorithm

http://www.computer.org/portal/web/csdl/doi/10.1109/ICICIC.2008.422

Plagiarism in texts is issues of increasing concern to the academic community. Now most common text plagiarism occurs by making a variety of minor alterations that include the insertion, deletion, or substitution of words. Such simple changes, however, require excessive string comparisons. In this paper, we present a hybrid plagiarism detection method. We investigate the use of a diagonal line, which is derived from Levenshtein distance, and simplified SmithWaterman algorithm that is a classical tool in the identification and quantification of local similarities in biological sequences, with a view to the application in the plagiarism detection. Our approach avoids globally involved string comparisons and considers psychological factors, which can yield significant speed-up by experiment results. Based on the results, we indicate the practicality of such improvement using Levenshtein distance and Smith-Waterman algorithm and to illustrate the efficiency gains. In the future, it would be interesting to explore appropriate heuristics in the area of text comparison