Gensim Doc2Vec Exception AttributeError: 'str' object has no attribute 'words'

Question

Gensim Doc2Vec Exception AttributeError: 'str' object has no attribute 'words'

2.9k views Asked by Rashmi Singh At 19 December 2016 at 13:07

I am learning Doc2Vec model from gensim library and using it as follows:

class MyTaggedDocument(object):
    def __init__(self, dirname):
        self.dirname = dirname

    def __iter__(self):
        for fname in os.listdir(self.dirname):
            with open(os.path.join(self.dirname, fname),encoding='utf-8') as fin:
                print(fname)
                for item_no, sentence in enumerate(fin):
                    yield LabeledSentence([w for w in sentence.lower().split() if w in stopwords.words('english')], [fname.split('.')[0].strip() + '_%s' % item_no])
sentences = MyTaggedDocument(dirname)
model = Doc2Vec(sentences,min_count=2, window=10, size=300, sample=1e-4, negative=5, workers=7)

The input dirname is a directory path which has , for the sake of simplicity, only 2 files located with each file containing more than 100 lines. I am getting following Exception.

Also, with print statement I could see that the iterator iterated over directory 6 times. Why is this so?

Any kind of help would be appreciated.

Original Q&A

There are 1 answers

**gojomo** · Answer 1 · 2017-01-19T02:52:01+00:00

It looks like one of the text-example objects, which should be shaped like a TaggedDocument (with words and tags properties, formerly called LabeledSentence), is somehow a plain string instead. Are you 100% certain that the error in your screenshot was generated by exactly the iterable code you've included? (The code here looks like it could only emit acceptable LabeledSentece objects.)

Your supplied corpus Iterable is read once to do an initial scan which discovered all words/tags, then again multiple times for training. How many times is controlled by the iter parameter, with a default value (in recent versions of gensim) of 5. So the initial scan plus 5 training passes equal 6 total iterations. (10 or more iterations is common with Doc2Vec.)

TechQA.

Gensim Doc2Vec Exception AttributeError: 'str' object has no attribute 'words'

There are 1 answers

Related Questions in PYTHON

Related Questions in NEURAL-NETWORK

Related Questions in GENSIM

Related Questions in WORD2VEC

Related Questions in DOC2VEC

Popular Questions

Popular Tags

Trending Questions