AttributeError: 'numpy.ndarray' object has no attribute 'A'

18.9k views Asked by At

I am trying to perform tfidf on a matrix. I would like to use gensim, but models.TfidfModel() only works on a corpus and therefore returns a list of lists of varying lengths (I want a matrix).

The options are to somehow fill in the missing values of the list of lists, or just convert the corpus to a matrix

numpy_matrix = gensim.matutils.corpus2dense(corpus, num_terms=number_of_corpus_features)

Choosing the latter, I then try to convert this count matrix to a tf-idf weighted matrix:

def TFIDF(m):
    #import numpy
    WordsPerDoc = numpy.sum(m, axis=0)
    DocsPerWord = numpy.sum(numpy.asarray(m > 0, 'i'), axis=1)
    rows, cols = m.shape
    for i in range(rows):
        for j in range(cols):
            amatrix[i,j] = (amatrix[i,j] / WordsPerDoc[j]) * log(float(cols) /     DocsPerWord[i])

But, I get the error AttributeError: 'numpy.ndarray' object has no attribute 'A'

I copied the function above from another script. It was:

def TFIDF(self):
    WordsPerDoc = sum(self.A, axis=0)        
    DocsPerWord = sum(asarray(self.A > 0, 'i'), axis=1)
    rows, cols = self.A.shape
    for i in range(rows):
       for j in range(cols):
          self.A[i,j] = (self.A[i,j] / WordsPerDoc[j]) * log(float(cols) / DocsPerWord[i])

Which I believe is where it's getting the A from. However, I re-imported the function.

Why is this happening?

1

There are 1 answers

0
hpaulj On BEST ANSWER

self.A is either an np.matrix or sparse matrix. For both A means, return a copy that is a np.ndarray. In other words, it converts the 2d matrix to a regular numpy array. If self is already an array, it would produce your error.

It looks like you have corrected that with your own version of TFIDF - except that uses 2 variables, m and amatrix instead of self.A.

I think you need to look more at the error message and stack, to identify where that .A is. Also make sure you understand where the code expects a matrix, especially a sparse one. And whether your own code differs in that regard.

I recall from other SO questions that one of the learning packages had switched to using sparse matrices, and that required adding .todense() to some of their code (which expected dense ones).