Llama Index Sentence splitter is limited by metadata length

800 views Asked by At

There is something I don't understand : The meta data are not embedded, so they shouldn't impact the split process. However, I'm trying to implement a small to big retriever but with small chunk size I have this error message :

"Metadata length (130) is longer than chunk size (128). Consider increasing the chunk size or decreasing the size of your metadata to avoid this."

Can you explain the reason why and how to make the metadata not affect the splitting process.

Here is the piece of code I use :

sub_chunk_sizes = [128, 256, 512]
sub_node_parsers = [
    SentenceSplitter.from_defaults(chunk_size=c,chunk_overlap=20) for c in sub_chunk_sizes
]

all_nodes = []
for base_node in tqdm(base_nodes):
    for n in sub_node_parsers:
        sub_nodes = n.get_nodes_from_documents([base_node])

The only solution I found so far was just reducing the metadata size or increase the chunk size.

0

There are 0 answers