Issues while encoding, decoding arabic language in terminal

Question

Issues while encoding, decoding arabic language in terminal

657 views Asked by Eman Naguib At 10 June 2015 at 17:24

In my script Cosine similarity need first, to convert an Arabic string into a vector before perform Cosine similarity on terminal under Linux --> problem while convert Arabic string to vector producing Arabic as:

[u'\u0627\u0644\u0634\u0645\u0633 \u0645\u0634\u0631\u0642\u0647 \u0646\u0647\u0627\u0631\u0627', u'\u0627\u0644\u0633\u0645\u0627\u0621 \u0632\u0631\u0642\u0627\u0621']

My script:

train_set = ["السماء زرقاء", "الشمس مشرقه نهارا"] #Documents
test_set = ["الشمس التى فى السماء مشرقه","السماء زرقاء"] #Query
stopWords = set(stopwords.words('english'))

vectorizer = CountVectorizer(stop_words = stopWords)
transformer = TfidfTransformer()
trainVectorizerArray = vectorizer.fit_transform(train_set).toarray()
testVectorizerArray = vectorizer.transform(test_set).toarray()
print 'Fit Vectorizer to train set', trainVectorizerArray
print 'Transform Vectorizer to test set', testVectorizerArray
cx = lambda a, b : round(np.inner(a, b)/(LA.norm(a)*LA.norm(b)), 3)

for vector in trainVectorizerArray:
    print vector
    for testV in testVectorizerArray:
        print testV
        cosine = cx(vector, testV)
        print cosine

Original Q&A

There are 1 answers

**Assem** · Answer 1 · 2016-01-16T00:03:49+00:00

Assem On 16 January 2016 at 00:03

Your result is a list of strings, just join the string and you will get a clear sentence:

>>> print "\n".join(a)
الشمس مشرقه نهارا
السماء زرقاء

TechQA.

Issues while encoding, decoding arabic language in terminal

There are 1 answers

Related Questions in PYTHON

Related Questions in ENCODING

Related Questions in ARABIC

Related Questions in COSINE-SIMILARITY

Related Questions in STRING-DECODING

Popular Questions

Trending Questions