ValueError: setting an array element with a sequence. Scikit learn

759 views Asked by At

I am trying to run exactly the same code, once at my macbook pro and once at Ubuntu machine at AWS.

My code looks just like this (It uses MultinomialNB() from scikit learn):

clf = MultinomialNB()
clf.fit(vectorized_data, labels)

On my macbook model training goes well, but on the Ubuntu machine I am getting:

<ipython-input-5-c52751e2119e> in <module>()
----> 1 m.train_models()

/home/ubuntu/topic_modeling/classification.pyc in train_models(self, minimal)
133                 continue
134             bm = BinaryModel(label)
--> 135         bm.train_models(self.vectorizer, self.data)
136             self.models.append(bm)
137             logger.info("Successfully trained model for the %s tag", label)

/home/ubuntu/topic_modeling/classification.pyc in train_models(self, vectorizer, data)
 92             # TODO some more complex grid search should be here
 93             clf = MultinomialNB()
---> 94         clf.fit(vectorized_data, labels)
 95             self.models.append(clf)
 96
/home/ubuntu/.virtualenvs/topics/local/lib/python2.7/site-packages/sklearn/naive_bayes.pyc in fit(self, X, y, sample_weight)
472             Returns self.
473         """
--> 474     X, y = check_X_y(X, y, 'csr')
475         _, n_features = X.shape
476

/home/ubuntu/.virtualenvs/topics/local/lib/python2.7/site-packages/sklearn/utils/validation.pyc in check_X_y(X, y, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric)
442     X = check_array(X, accept_sparse, dtype, order, copy, force_all_finite,
443                     ensure_2d, allow_nd, ensure_min_samples,
--> 444                 ensure_min_features)
445     if multi_output:
446         y = check_array(y, 'csr', force_all_finite=True, ensure_2d=False,

/home/ubuntu/.virtualenvs/topics/local/lib/python2.7/site-packages/sklearn/utils/validation.pyc in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features)
342             else:
343                 dtype = None
--> 344     array = np.array(array, dtype=dtype, order=order, copy=copy)
345         # make sure we actually converted to numeric:
346         if dtype_numeric and array.dtype.kind == "O":

ValueError: setting an array element with a sequence.

I'd like to run the training on the Ubuntu machine to be able to run it in screen.

When I try pip freeze than on both machines it looks exactly the same. Does anyone has some idea what can be possibly wrong?

EDIT

labels is just list of 0 and 1, eg. [0, 1, 0, 0, 0, 1]

vectorized_data is obtained using gensim framework. First tokenizing text, then converting it to bow by:

bow_text = self.dictionary.doc2bow(tokenized_text)
self.tfidf = models.TfidfModel(dictionary=self.dictionary)
gensim.matutils.sparse2full(self.tfidf[bow_text], self.tfidf.num_nnz)
0

There are 0 answers