Creating knowledge graph index out of a XML (DEXPI) file

166 views Asked by At

Context:

I have a XML file (DEXPI) and I want to use it as a data source to implement Retrieval Augmented Generation (RAG) system using llama-index to fetch the correct context against any natural language query.

Current Issue:

  • I cannot use the XML file like a text document.
  • llama-index does not provide any type of splitter for XML data so that XML data can be correctly divided into chunks (nodes).
  • Even if we write some custom chunker/splitter, a lot of unwanted jargons would be still there in the chunks like XML tags and other metadata related to XML.

What did I try?

To solve this issue I have 2 approaches:

Approach 1:

Convert the XML into SQl tables (or CSVs). Convert these tables into natural language english text. Then pass this text to llama-index for further processing. Here, while preparing the knowledge graph index, the llama-index will automatically figure out the vertices (entities) and the edges (relationships) between them.

Approach 2:

Convert the XML into SQL tables (or CSVs). Convert these SQL tables into Graph DB entities & relationships manually. Then query the graph db by using a graph query generated from any LLM.

My Questions:

  1. I need suggestions on which approach to choose currently & how effective they are.
  2. Are there any better approaches to deal with XML data when using llama-index.
0

There are 0 answers