Text summarization in a "document" without sentences

113 views Asked by MRicci At 05 August 2021 at 13:40

I have a non-linguistic corpus of ~100 "documents", each comprising a sequence of ~10k "words" (i.e. I have a set of ~100 integer sequences). I can learn good doc2vec embeddings that respect known classes in the corpus. I'm now interested in summarizing these documents to help explain which motifs are not only representative of each document but also discriminative between classes.

I am primarily familiar with TextRank as an extractive summarization method, but this typically relies on sentences (i.e. subsequences that end with a period) as a sensible atom for the underlying node ranking algorithm. In my case, the sequence tokens are not known in advance as there are no sentences, per se.

Are there any summarization methods that take this into account? So far, I have tried using TextRank on all n-grams for a fixed n, but this precludes summaries involving tokens of different lengths, which happens to be crucial in my setting. Are there any multi-scale summarization methods, for instance?

Original Q&A

TechQA.

Text summarization in a "document" without sentences

There are 0 answers

Related Questions in NLP

Related Questions in DOC2VEC

Related Questions in PAGERANK

Related Questions in SUMMARIZATION

Related Questions in TEXTRANK

Popular Questions

Popular Tags

Trending Questions