I want to find Word similarity Using WordNet

2.8k views Asked by At

I am doing final year project on "web video categorization", in which one part is to find the similar (synonyms) words for a particular word and I want to remove similar terms from it.

I know Java language, so I chosen "Word Similarity For Java" ws4j

For that I have only used WS4J1.0.1 jar file , even I have not downloaded any extra files like WordNet lexical database or sqlite database to store it. Because in that website they have mentioned that all are contained as precompiled in this jar file.

When I executed Demo program SimilarityCalculationDemo.java, I got the following errors:

    java.sql.BatchUpdateException: batch entry 0: [SQLITE_CORRUPT]  The database disk image is malformed (database disk image is malformed)
    at org.sqlite.Stmt.executeBatch(Stmt.java:226)
    at org.sqlite.Stmt.executeBatch(Stmt.java:226)
    at edu.cmu.lti.jawjaw.db.SQL.createIndexIfNotExists(SQL.java:118)
    at edu.cmu.lti.jawjaw.db.SQL.createSQLConnection(SQL.java:98)
    at edu.cmu.lti.jawjaw.db.SQL.<init>(SQL.java:55)
    at edu.cmu.lti.jawjaw.db.SQL.<clinit>(SQL.java:45)
    at edu.cmu.lti.jawjaw.db.WordDAO.findWordsByLemmaAndPos(WordDAO.java:124)
    at edu.cmu.lti.jawjaw.util.WordNetUtil.wordToSynsets(WordNetUtil.java:38)
    at edu.cmu.lti.lexical_db.NictWordNet.getAllConcepts(NictWordNet.java:38)
                             atedu.cmu.lti.ws4j.util.WordSimilarityCalculator.calcRelatednessOfWords(WordSimilarityCalculator.java:79)
    at edu.cmu.lti.ws4j.RelatednessCalculator.calcRelatednessOfWords(RelatednessCalculator.java:61)
    at web_cat.SimilarityCalculationDemo.run(SimilarityCalculationDemo.java:37)
    at web_cat.SimilarityCalculationDemo.main(SimilarityCalculationDemo.java:43)
        java.sql.SQLException: [SQLITE_CORRUPT]  The database disk image is malformed (database disk image is malformed)
    at org.sqlite.DB.newSQLException(DB.java:383)
    at org.sqlite.DB.newSQLException(DB.java:387)
    at org.sqlite.DB.throwex(DB.java:374)
    at org.sqlite.NativeDB.prepare(Native Method)
    at org.sqlite.DB.prepare(DB.java:123)
    at org.sqlite.Stmt.execute(Stmt.java:113)
    at edu.cmu.lti.jawjaw.db.SQL.setPragmaCacheSize(SQL.java:137)
    at edu.cmu.lti.jawjaw.db.SQL.createSQLConnection(SQL.java:99)
    at edu.cmu.lti.jawjaw.db.SQL.<init>(SQL.java:55)
    at edu.cmu.lti.jawjaw.db.SQL.<clinit>(SQL.java:45)
    at edu.cmu.lti.jawjaw.db.WordDAO.findWordsByLemmaAndPos(WordDAO.java:124)
    at edu.cmu.lti.jawjaw.util.WordNetUtil.wordToSynsets(WordNetUtil.java:38)
    at edu.cmu.lti.lexical_db.NictWordNet.getAllConcepts(NictWordNet.java:38)
    at edu.cmu.lti.ws4j.util.WordSimilarityCalculator.calcRelatednessOfWords(WordSimilarityCalculator.java:79)
    at edu.cmu.lti.ws4j.RelatednessCalculator.calcRelatednessOfWords(RelatednessCalculator.java:61)
    at web_cat.SimilarityCalculationDemo.run(SimilarityCalculationDemo.java:37)
    at web_cat.SimilarityCalculationDemo.main(SimilarityCalculationDemo.java:43)
java.sql.SQLException: [SQLITE_CORRUPT]  The database disk image is malformed (database disk image is malformed)
    at org.sqlite.DB.newSQLException(DB.java:383)
    at org.sqlite.DB.newSQLException(DB.java:387)
    at org.sqlite.DB.throwex(DB.java:374)
    at org.sqlite.NativeDB.prepare(Native Method)
    at org.sqlite.DB.prepare(DB.java:123)
    at org.sqlite.PrepStmt.<init>(PrepStmt.java:42)
    at org.sqlite.Conn.prepareStatement(Conn.java:404)
    at org.sqlite.Conn.prepareStatement(Conn.java:399)
    at org.sqlite.Conn.prepareStatement(Conn.java:383)
    at edu.cmu.lti.jawjaw.db.SQL.prepareStatements(SQL.java:151)
    at edu.cmu.lti.jawjaw.db.SQL.<init>(SQL.java:56)
    at edu.cmu.lti.jawjaw.db.SQL.<clinit>(SQL.java:45)
    at edu.cmu.lti.jawjaw.db.WordDAO.findWordsByLemmaAndPos(WordDAO.java:124)
    at edu.cmu.lti.jawjaw.util.WordNetUtil.wordToSynsets(WordNetUtil.java:38)
    at edu.cmu.lti.lexical_db.NictWordNet.getAllConcepts(NictWordNet.java:38)
    at edu.cmu.lti.ws4j.util.WordSimilarityCalculator.calcRelatednessOfWords(WordSimilarityCalculator.java:79)
    at edu.cmu.lti.ws4j.RelatednessCalculator.calcRelatednessOfWords(RelatednessCalculator.java:61)
    at web_cat.SimilarityCalculationDemo.run(SimilarityCalculationDemo.java:37)
    at web_cat.SimilarityCalculationDemo.main(SimilarityCalculationDemo.java:43)
Exception in thread "main" java.lang.NullPointerException
    at edu.cmu.lti.jawjaw.db.WordDAO.findWordsByLemmaAndPos(WordDAO.java:125)
    at edu.cmu.lti.jawjaw.util.WordNetUtil.wordToSynsets(WordNetUtil.java:38)
    at edu.cmu.lti.lexical_db.NictWordNet.getAllConcepts(NictWordNet.java:38)
    at edu.cmu.lti.ws4j.util.WordSimilarityCalculator.calcRelatednessOfWords(WordSimilarityCalculator.java:79)
    at edu.cmu.lti.ws4j.RelatednessCalculator.calcRelatednessOfWords(RelatednessCalculator.java:61)
    at web_cat.SimilarityCalculationDemo.run(SimilarityCalculationDemo.java:37)
    at web_cat.SimilarityCalculationDemo.main(SimilarityCalculationDemo.java:43)
Java Result: 1

I am Using Netbeans IDE 7.4 with JDK 6.

Could any please assist me, how to overcome from this problem, because there is a less documentation available in the internet about ws4j.

1

There are 1 answers

5
Leo On BEST ANSWER

Well, I could not reproduce your error. For me it just worked perfectly out of the box, using eclipse, so I'll try to help you reproducing exactly what I've did

  1. download ws4j-1.0.1.jar from https://ws4j.googlecode.com/files/ws4j-1.0.1.jar and ensure it's size after the download is 41,362,723 bytes (at least, that's what eclipse told me in my linux box)

  2. Use java 7

  3. Create a simple eclipse project and drop the jar there. Then add the jar to the build path (right click -> build path -> add)

  4. Create an appropriate package and class to accommodate the demo class

  5. Just run the demo and you'll get something like

    edu.cmu.lti.ws4j.impl.HirstStOnge   0.0
    edu.cmu.lti.ws4j.impl.LeacockChodorow   1.3862943611198906
    edu.cmu.lti.ws4j.impl.Lesk  0.0
    edu.cmu.lti.ws4j.impl.WuPalmer  0.4
    edu.cmu.lti.ws4j.impl.Resnik    2.5031573470157453
    edu.cmu.lti.ws4j.impl.JiangConrath  0.11150424023847051
    edu.cmu.lti.ws4j.impl.Lin   0.3582442863008455
    edu.cmu.lti.ws4j.impl.Path  0.14285714285714285
    Done in 1951 msec.
    

enter image description here