scala spark word2vec try and catch exception

1k views Asked by At

I am trying to create a dictionary using spark's word2vec. In the process, I create an Array of around 200 words, and apply the findSynonyms function to each of them. However, out of the 200 words, there will be a few words that will not return any Synonyms (due to the training data size I suppose). The spark function will then throw an exception that triggers the process to stop.

What I am trying to do is to try and catch this exception so that if the word does not generate any synonyms, move on to the next one and return something like unknown or null.

Here is what I've been doing:

 val synonyms = sc.parallelize(listwords map{x=> (x, try {model.findSynonyms(x, 30)} catch {case e: Exception => ("Exception",0.0) })})

However, using Try and Catch turns value synonyms' type into java.io.Serializable instead of pairs of (String, Double)

Am I doing something wrong with the Try and Catch? Is there a better way to do this?

1

There are 1 answers

0
Wesley Miao On BEST ANSWER

Your catch {} should return a Double, instead of a (String, Double). Otherwise the compiler will try to find the common parent of Double (returned by your try{}) and (String, Double) (returned by your catch{} in case of exception), which is Serializable.

Also do you really want to do map before parallelize? Or after?

I would write it this way (first parallelize listWords to get a rdd and then do map on this rdd):

val synonyms = sc.parallelize(listwords) map { x => 
  (x, try {model.findSynonyms(x, 30)} catch {case e: Exception => 0.0})
}