I'm trying to query contents of email using Azure Cognitive Search, and I am trying to figure out if this technology is suitable for what I want to do. Emails can contain both text and HTML representation and will be written using natural language. As I am new to this I have some questions:
- Can I use Azure Cognitive Search to filter out data, such as dates, names and numbers? For example, I want a user to submit an email saying "I want 5 apples on the 20th of July", and my software to update itself with data such as "Apples, 5, 20/07"
- How do I handle HTML contents in emails?
Thank you. I am completely new to these sorts of technology so any suggestions are more than welcome
To filter out data such as dates, names, and numbers, you can use Azure Cognitive Search's AI enrichment feature during indexing.
AI enrichment allows you to extract text and information from content using built-in skills from Microsoft, such as text translation or Optical Character Recognition (OCR), or custom skills that you provide.
For specific entities you can create custom skills to extract specific information like dates, names, and numbers from the email content. For reference implementation you can check Custom Entity Lookup skill in Azure Cognitive Search.
Also, to handle HTML content you can check Index data from Azure Blob Storage as blob indexer can extract text from the following document formats (Relevant in your scenario):
For details regarding indexing and creating skill set with different format document you can refer to Quickstart: Create a skillset in the Azure portal.