fasttext embeddings in order to do logistic regression

21 views Asked by At

I want to make embeddings and then do a logistic regression. The output data are these

0       [[-0.00034277988, 0.0013405628, -1.998733e-05,...
1       [[0.00075779966, -0.00025276924, 0.0009634475,...
2       [[-0.0032675266, -0.0015163509, 0.0051634307, ...
3       [[0.0006605284, -0.0040500723, 0.0041460698, -...
                              ...
4774    [[0.0005923094, -0.00194318, 0.0015639212, 0.0...
4775    [[-0.002365636, 0.0023984204, -0.0004855222, -...
4776    [[-0.0028686645, 0.0019738101, 0.0037081288, 0...
4777    [[0.0024941873, -0.0019521558, -0.0019918315, ...
Name: Tweet, Length: 4779, dtype: object

But in order to do the regression i need them to be type number so i need each number on a different column: [4778 rows x 768 columns]

My fasttext code is this. I dont know if its better to change the fasttext code or after i have the embeddings ready then do the changes

df = pd.read_csv('OGTDv1.csv')

sentences = [word_tokenize(rev.lower()) for rev in df.Tweet.to_string(index=False)]
model = FastText(sentences, vector_size=128, window=5, min_count=3, workers=4, epochs=10, seed=42)
model.save('tokped_review.ft')

ftext = model.wv

def get_sentence_embeddings(sentence, model):
    tokens = word_tokenize(sentence.lower())
    embeddings = [model.wv[token] for token in tokens if token in model.wv]
    return embeddings

df_emb = df['Tweet'].apply(lambda x: get_sentence_embeddings(x, model))


print(df_emb)
df_emb.to_pickle('ToxicityFastText_Embeddings.pkl')```
0

There are 0 answers