i am writing a project on plagiarism detection with Java, in this case for the first step i need to do the following tasks :
inputing file (txt, .pdf, .doc)
convert the file content to text
removing stop words tokenizng into n-gram
processing the text-similarity algorithms on the texts
reporting plagiarism detection signs
i did these steps by coding myself, but now i feel a lot of performance lacks in it, so i started using available API es for my work, is there any one who has worked with ws4j library ? any Docs or helps available for it? i couldt reuse it. it is exactly what i want, look at the demo
Apart from what you can see on the website, there is no documentation that I could find. I suggest you start by looking at the code (use SVN or git to check it out). Please note that you'll need the binary distribution, because the source is not complete.
The simple tutorial works for most cases. You've probably already found it in the source code:
If you want to compare specific synsets, you'll have to create a
Concept
first. Example for most common sense of "jump":The library doesn't actually work like the online demo. To use the typical notation for synsets, I use my own utility method. So comparing the specific synsets looks like this: