I want to Develop an Android app that summarizes a user-entered text (could be a news article)

217 views Asked by At

I searched for extractive and abstractive summarization methods.I would like to make inferential summarization because of the many disadvantages of abstractive summarization.I want to be able to summarize inferential using the supervised learning method.In my research for extraction summarization, I always came across the TextRank algorithm, but this is an unsupervised learning method.I want to be able to summarize inferential using the supervised learning method. Is it possible? Can I run TextRank on a dataset containing 15000 data (for example)?

The codes given below should not be taken into consideration.Irrelevant codes to share questions.

word_embeddings = {}
f = open('/content/drive/MyDrive/MetinAnalizi/glove.6B.100d.txt', encoding='utf-8')
for line in f:
    values = line.split()
    word = values[0]
    coefs = np.asarray(values[1:], dtype='float32')
    word_embeddings[word] = coefs
f.close()
sim_mat = np.zeros([len(sentences), len(sentences)])
from sklearn.metrics.pairwise import cosine_similarity
for i in range(len(sentences)):
  for j in range(len(sentences)):
    if i != j:
      sim_mat[i][j] = cosine_similarity(sentence_vectors[i].reshape(1,100), sentence_vectors[j].reshape(1,100))[0,0]
1

There are 1 answers

1
Paco On

There's a wide variety of text summarization methods, and the use of deep learning in NLP (aka, language models, transformers, etc.) since late 2017 has led to many advancements.

Some of the trade-offs here depend on quality vs. cost. For example, using extractive summarization with TextRank is relatively less expensive and does not require a trained model. OTOH, using abstractive summarization approaches with DL models will tend to be much more expensive, though also produce better results.

In terms of PyTextRank we have different algorithm variants implemented, which produce different kinds of extractive summarization – depending on the intended use case. News article summarizes might prefer to use PositionRank while research article summaries might prefer to use Biased TextRank. This is due to the kinds of phrases that are likely to be emphasized, depending on the typical style and structure of writing encountered in those domains.

My advice is to experiment and see what fits your needs best? If you have many articles to summarize and want to keep the budget low, then TextRank might work well. If you need better appearance of text in the summaries, perhaps abstractive summarization is needed.