Single or multiple key-value pair data structure for vector search with Marqo?

85 views Asked by At

I'm implementing vector search for a work project using Marqo Cloud. I have some data for my documents (products) that can either be structured as:

A single key-value pair, for example: Tags: red, spotty, nylon, casual

Or multiple key-value pairs with each tag title, for example: Color: red Design: spotty Material: nylon Style: casual

Will one of these data structures perform better than the other in vector search? Or is the difference likely to be negligible?

2

There are 2 answers

0
crabdog On

If these are all tensor fields (used during vector search) here are some things to be aware of.

  • The single k-v pair will only work if the value is a string. List of strings are only supported as non-tensor fields used for filtering. So this would work: Tags: "blue, patterned, cotton, elegant"
  • Because there is only one k-v pair, only a single tensor field is generated. Because the string is short, you will likely only get a single vector for this tensor field
  • For the multiple k-v pair case, each k-v pair will generate a tensor field (and 1 vector per tensor field).
  • The extra vectors will require greater RAM usage, and perhaps, depending on the scale of your index, a slightly slower search speed

Vector search performs better when the model that is creating the embedding has some context. So an embedding generated from the string "blue, patterned, cotton, elegant" will likely have better recall performance than 4 separate embeddings that are each generated from a single word. So for recall, speed and resource performance the first option will work better in most cases.

0
Nube Colectiva On

What crabdog says is good, reading should be fast and avoid minimal resource consumption.

I don't know exactly how your project is structured, but remember that you can use data cachers, queues and other techniques to mi