I am working on the Question and Answer using the ChatCompletion at Azure OpenAI within certain business document only. I am good at embedding the certain documents and questions. The embedding on the contents and question will help me to bring the relevant content and then user can ask QnA within that content only.
My ask is, I want to cache the question and answer, if somebody asking the similar question then use the same answer instead of going to OpenAI. This will reduce the cost.
The nature of AOAI calls are stateless, so to be able to create a "Cache" layer you will build solution using Cognitive Search (or other Vector DB) for scenario you mentioned.
We have sample implementation using Cognitive Search at akshata29/chatpdf: Chat and Ask on your own data. Accelerator to quickly upload your own enterprise data and use OpenAI services to chat to that uploaded data and ask questions.