logic programming - find synonyms of a given word

1k views Asked by At

Task is to write a program that can decide whether two words are synonyms or not.

I have pairs of synonymous words, E.g.:

 [big large]  
 [large huge]  
 [small little] 

We can derive the synonymous relationship indirectly: if big is a synonym for large and large is a synonym for huge then big is a synonym for huge.
Being synonyms doesn’t depend on order, e.g. if big is a synonym for large then large is a synonym for big.

The program has to answer whether given two words are synonyms or not.

This seems to me as a good candidate for logic programming, e.g. with clojure.core.logic.

a. How to state the input pairs of synonyms as logical statements/constraints?
b. How to perform a query to find all synonyms of a given word?

Or perhaps I am yack shaving? What would be a simpler way to solve this?

2

There are 2 answers

0
sascha On

In general, this just looks like you are interested in the transitive closure (of a graph)?

For undirected graphs ("Being synonyms doesn’t depend on order"), this is related to Connected Components (and here we directly see some logic-relation as 2-SAT algorithms also exploit connected components).

Maybe see also: Algowiki

Basically:

  • preprocessing-time: calculate the transitive closure of your input
  • query-time -> synonym(a,b): check if edge (a,b) exists

Or without preprocessing: just look for a path on-demand:

  • query-time -> synonym(a,b): check if there is a path between a, b
    • BFS/DFS
0
Harold On

Clojure's data structures make this kind of thing super-fun.

The idea here is that a set of synonyms is itself shared (as the set of synonyms) for all words in the set.

So, we produce a map from words to their set of synonyms:

(require '[clojure.set :as set])

(let [pairs [["big" "large"]
             ["large" "huge"]
             ["small" "little"]]
      m (reduce (fn [eax [w1 w2]]
                  (let [s (set/union #{w1 w2} (get eax w1) (get eax w2))]
                    (merge eax (into {} (map #(vector % s) s)))))
                {}
                pairs)
      syn? (fn [w1 w2] (boolean ((m w1 #{}) w2)))]
  (println "big, large:" (syn? "big" "large"))
  (println "big, huge:" (syn? "big" "huge"))
  (println "huge, big:" (syn? "huge" "big"))
  (println "big, small:" (syn? "big" "small"))
  (println "code, large:" (syn? "code" "large")))

Additional fun Clojure exercises from here:

  1. What is the time and space complexity of building m?

  2. What is the time complexity of syn?

  3. Describe (and then confirm with measurement) the structural sharing going on inside m.