Intro to my problem: users can search for terms and RitaWordNet provides a method called getSenseIds() to get the related senses. By now I am using WS4J (WordNet Similarity for Java, http://code.google.com/p/ws4j/) that has different algorithms to define distance. A search for "user" has this result:
- user
- exploiter
- drug user
The Lin-distance is measured by comparing two terms in WS4J (with targetWord I assume?):
- Similarity between: user and: user = 1.7976931348623157E308
- Similarity between: user and: exploiter = 0.1976958835785797
I would like to return to the end-user a suggestion that the "user" sense is the most relevant/correct answer, but the problem is that this depends on the rest of the sentence.
Example: "The old man was a regular user of public transport", "The young man became became a drug user while studying NLP..".
I assume that the senserelate project has something included that I'm missing. This thread also got picked up during my search: word disambiguation algorithm (Lesk algorithm)
Hopefully someone got my question :)
You might want to try WordNet::SenseRelate::AllWords - there's an online demo at http://maraca.d.umn.edu