I've trained an Azure LUIS service model that takes sentences as an input, extracts the key information, and give back the JSON response.
It's working fine for short sentences, now I want it to take a document (PDF, DOCX) and analyze all of the pages and then extract the required information (like StartingDate, EndingDate, CompanyName, etc.). is it possible to do that with any addition?
OR any guidance on how could I analyze the whole document and extract key information.
Any kind of information would be appreciated! Thank you
@Farhan Mubasher LUIS works well if you are passing in sentences or utterances where it is able to extract information like dates & names as entities. Most of these are actually available as pre-built entities to train the model and extract them from an utterance.
If you are planning to use the whole document like a PDF document with multiple pages it is easier to use services like form recognizer or use the READ API of Azure computer vision to extract the text. Using some pre-processing techniques you can pass the sentences to your LUIS trained model and process the response.
If your end goal is to extract information like dates and company information from documents of a certain format Form recognizer works great. You only need to train the model with some documents of similar format and then use the Analyze API to extract this information as labels which is available in the JSON response. Please checkout the form recognizer labeling tool which is very simple to setup and use.