I would like to extract a basic list of Synonyms from a database for my search engine. This includes commonly spelled names such as Shaun vs. Shawn, the different variations of Muhammad, acronyms of named entities such as United Nations(UN) or Severe acute respiratory syndrome(SARS).
After extraction,this this list of Synonyms will then be placed in a server and stored as such - a string of related terms/synonyms.
I have used the jaws API and managed to get synonyms of particular words that I have entered. This is one of the example which I have tried.
Synonyms of NASA:
- National Aeronautics and Space Administration: an independent agency of the United States government responsible for aviation and spaceflight.
The following is the code I have used.
/**
* Main entry point. The command-line arguments are concatenated together
* (separated by spaces) and used as the word form to look up.
*/
public static void main(String[] args)
{
arg[0]="NASA";
if (args.length > 0)
{
// Concatenate the command-line arguments
StringBuffer buffer = new StringBuffer();
for (int i = 0; i < args.length; i++)
{
buffer.append((i > 0 ? " " : "") + args[i]);
}
String wordForm = buffer.toString();
// Get the synsets containing the wrod form
WordNetDatabase database = WordNetDatabase.getFileInstance();
Synset[] synsets = database.getSynsets(wordForm);
// Display the word forms and definitions for synsets retrieved
if (synsets.length > 0)
{
System.out.println("The following synsets contain '" +
wordForm + "' or a possible base form " +
"of that text:");
for (int i = 0; i < synsets.length; i++)
{
System.out.println("");
String[] wordForms = synsets[i].getWordForms();
for (int j = 0; j < wordForms.length; j++)
{
System.out.print((j > 0 ? ", " : "") +
wordForms[j]);
}
System.out.println(": " + synsets[i].getDefinition());
}
}
else
{
System.err.println("No synsets exist that contain " +
"the word form '" + wordForm + "'");
}
}
else
{
System.err.println("You must specify " +
"a word form for which to retrieve synsets.");
}
}
However, this method would require me to manually enter all the words that I want to query for. Is there a way to loop through the entire dictionary to store all the various words and its synonyms in a word list(Text form)?
Thank you
I'm in the same boat for my project, but I did find someone who had already done various WordNet extractions: https://sourceforge.net/projects/wordnetport/files/?source=navbar
It wasn't a great help for me, since the WordNet synonym groups are pretty shallow, but hopefully they'll do the trick for you (or someone synonymous.)