sklearn DictVectorizer() throwing error with a dictionary as input

125 views Asked by At

I'm fairly new to sklearn's DictVectorizer, and am trying to create a function where DictVectorizer will output feature names from a list of bigrams that I have used to form a from a feature dictionary. The input to my function is a string, and the function should return a list consisting of a formed into dictionaries (something like this).

def features (str) -> List[Dict[Text, Union[Text, int]]]:
   
    # my feature dictionary should have 'bigram' as the key, and the values will be the bigrams themselves.  your feature dict needs to have "bigram" as a key
    # bigram: a form of "w[i]-w[i+1]"
    
    # This is my bigram list (as structured above)
    bigrams: List[Dict[Text, Union[Text, int]]] = []
    
    # here is my code:
    bigrams  = {'bigram':i for j in sentence for i in zip(j.split(" "). 
    [:-1], j.split(" ")[1:])}

    return bigrams

vect = DictVectorizer(sparse=False)

text = str()

feature_catalog = features(text)

vect.fit(feature_catalog)

print(sorted(vectorizer.get_feature_names_out()))

Everything works fine until the code advances to the DictVectorizer blocks (hidden in the class itself). This is what I get:

AttributeError                            Traceback (most recent call last)
/var/folders/pl/k80fpf9s4f9_3rp8hnpw5x0m0000gq/T/ipykernel_3804/266218402.py in <module>
     22 features = get_feature(text)
     23 
---> 24 vectorizer.fit(features)
     25 
     26 print(sorted(vectorizer.get_feature_names()))

/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/sklearn/feature_extraction/_dict_vectorizer.py in fit(self, X, y)
    159 
    160         for x in X:
--> 161             for f, v in x.items():
    162                 if isinstance(v, str):
    163                     feature_name = "%s%s%s" % (f, self.separator, v)

AttributeError: 'str' object has no attribute 'items'

Any ideas? This ultimately going to be used as part of a larger processsing effort on a corpus.

0

There are 0 answers