I am trying to create a dictionary using spark's word2vec. In the process, I create an Array of around 200 words, and apply the findSynonyms function to each of them. However, out of the 200 words, there will be a few words that will not return any Synonyms (due to the training data size I suppose). The spark function will then throw an exception that triggers the process to stop.
What I am trying to do is to try and catch this exception so that if the word does not generate any synonyms, move on to the next one and return something like unknown or null.
Here is what I've been doing:
val synonyms = sc.parallelize(listwords map{x=> (x, try {model.findSynonyms(x, 30)} catch {case e: Exception => ("Exception",0.0) })})
However, using Try and Catch turns value synonyms' type into java.io.Serializable instead of pairs of (String, Double)
Am I doing something wrong with the Try and Catch? Is there a better way to do this?
Your
catch {}
should return aDouble
, instead of a(String, Double)
. Otherwise the compiler will try to find the common parent ofDouble
(returned by your try{}) and(String, Double)
(returned by your catch{} in case of exception), which is Serializable.Also do you really want to do
map
beforeparallelize
? Or after?I would write it this way (first parallelize listWords to get a rdd and then do map on this rdd):