I am trying to run exactly the same code, once at my macbook pro and once at Ubuntu machine at AWS.
My code looks just like this (It uses MultinomialNB() from scikit learn):
clf = MultinomialNB()
clf.fit(vectorized_data, labels)
On my macbook model training goes well, but on the Ubuntu machine I am getting:
<ipython-input-5-c52751e2119e> in <module>()
----> 1 m.train_models()
/home/ubuntu/topic_modeling/classification.pyc in train_models(self, minimal)
133 continue
134 bm = BinaryModel(label)
--> 135 bm.train_models(self.vectorizer, self.data)
136 self.models.append(bm)
137 logger.info("Successfully trained model for the %s tag", label)
/home/ubuntu/topic_modeling/classification.pyc in train_models(self, vectorizer, data)
92 # TODO some more complex grid search should be here
93 clf = MultinomialNB()
---> 94 clf.fit(vectorized_data, labels)
95 self.models.append(clf)
96
/home/ubuntu/.virtualenvs/topics/local/lib/python2.7/site-packages/sklearn/naive_bayes.pyc in fit(self, X, y, sample_weight)
472 Returns self.
473 """
--> 474 X, y = check_X_y(X, y, 'csr')
475 _, n_features = X.shape
476
/home/ubuntu/.virtualenvs/topics/local/lib/python2.7/site-packages/sklearn/utils/validation.pyc in check_X_y(X, y, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric)
442 X = check_array(X, accept_sparse, dtype, order, copy, force_all_finite,
443 ensure_2d, allow_nd, ensure_min_samples,
--> 444 ensure_min_features)
445 if multi_output:
446 y = check_array(y, 'csr', force_all_finite=True, ensure_2d=False,
/home/ubuntu/.virtualenvs/topics/local/lib/python2.7/site-packages/sklearn/utils/validation.pyc in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features)
342 else:
343 dtype = None
--> 344 array = np.array(array, dtype=dtype, order=order, copy=copy)
345 # make sure we actually converted to numeric:
346 if dtype_numeric and array.dtype.kind == "O":
ValueError: setting an array element with a sequence.
I'd like to run the training on the Ubuntu machine to be able to run it in screen.
When I try pip freeze
than on both machines it looks exactly the same.
Does anyone has some idea what can be possibly wrong?
EDIT
labels
is just list of 0 and 1, eg. [0, 1, 0, 0, 0, 1]
vectorized_data
is obtained using gensim framework. First tokenizing text, then converting it to bow by:
bow_text = self.dictionary.doc2bow(tokenized_text)
self.tfidf = models.TfidfModel(dictionary=self.dictionary)
gensim.matutils.sparse2full(self.tfidf[bow_text], self.tfidf.num_nnz)