Llama Index Sentence splitter is limited by metadata length

792 views Asked by Nathan Redin At 20 December 2023 at 23:31

There is something I don't understand : The meta data are not embedded, so they shouldn't impact the split process. However, I'm trying to implement a small to big retriever but with small chunk size I have this error message :

"Metadata length (130) is longer than chunk size (128). Consider increasing the chunk size or decreasing the size of your metadata to avoid this."

Can you explain the reason why and how to make the metadata not affect the splitting process.

Here is the piece of code I use :

sub_chunk_sizes = [128, 256, 512]
sub_node_parsers = [
    SentenceSplitter.from_defaults(chunk_size=c,chunk_overlap=20) for c in sub_chunk_sizes
]

all_nodes = []
for base_node in tqdm(base_nodes):
    for n in sub_node_parsers:
        sub_nodes = n.get_nodes_from_documents([base_node])

The only solution I found so far was just reducing the metadata size or increase the chunk size.

Original Q&A

TechQA.

Llama Index Sentence splitter is limited by metadata length

There are 0 answers

Related Questions in LLAMA-INDEX

Related Questions in MEDIAMETADATARETRIEVER

Related Questions in RETRIEVAL-AUGMENTED-GENERATION

Popular Questions

Popular Tags

Trending Questions