I have some technical documents that I need to extract the text from regarding a specific set of procedures. Is there an easy off-the-shelf way to 'show' a language model the text to be extracted and each of the documents and then have it extract that text programmatically?
I was thinking of taking each paragraph and taking like a mean of all the word embeddings within the paragraph to create a 'paragraph embedding' and essentially comparing those to the 'paragraph embeddings' of the training set extracted text but I didn't know if there was a more robust way of doing that.