I have a very simple code taken from this example, where I am using the Lin, Path and Wu-Palmer similarity measures to compute the similarity between two words. My code is as follows:
import edu.cmu.lti.lexical_db.ILexicalDatabase;
import edu.cmu.lti.lexical_db.NictWordNet;
import edu.cmu.lti.ws4j.RelatednessCalculator;
import edu.cmu.lti.ws4j.impl.Lin;
import edu.cmu.lti.ws4j.impl.Path;
import edu.cmu.lti.ws4j.impl.WuPalmer;
public class Test {
private static ILexicalDatabase db = new NictWordNet();
private static RelatednessCalculator lin = new Lin(db);
private static RelatednessCalculator wup = new WuPalmer(db);
private static RelatednessCalculator path = new Path(db);
public static void main(String[] args) {
String w1 = "walk";
String w2 = "trot";
System.out.println(lin.calcRelatednessOfWords(w1, w2));
System.out.println(wup.calcRelatednessOfWords(w1, w2));
System.out.println(path.calcRelatednessOfWords(w1, w2));
}
}
And the scores are as expected EXCEPT when both words are identical. If both words are the same (e.g. w1 = "walk"; w2 = "walk";
), the three measures I have should each return 1.0. But instead, they are returning 1.7976931348623157E308.
I have used ws4j before (the same version, in fact), but I have never seen this behavior. Searching online has not yielded any clues. What could possibly be going wrong here?
P.S. The fact that the Lin, Wu-Palmer and Path measures should return 1 can also be verified with the online demo provided by ws4j
I had raised this issue at the googlecode site for ws4j, and it turns out that indeed it was a bug. The reply I received is as follows:
And here is the (now resolved) issue on their site.