I have a non-linguistic corpus of ~100 "documents", each comprising a sequence of ~10k "words" (i.e. I have a set of ~100 integer sequences). I can learn good doc2vec embeddings that respect known classes in the corpus. I'm now interested in summarizing these documents to help explain which motifs are not only representative of each document but also discriminative between classes.
I am primarily familiar with TextRank as an extractive summarization method, but this typically relies on sentences (i.e. subsequences that end with a period) as a sensible atom for the underlying node ranking algorithm. In my case, the sequence tokens are not known in advance as there are no sentences, per se.
Are there any summarization methods that take this into account? So far, I have tried using TextRank on all n-grams for a fixed n, but this precludes summaries involving tokens of different lengths, which happens to be crucial in my setting. Are there any multi-scale summarization methods, for instance?