I'm implementing vector search for a work project using Marqo Cloud. I have some data for my documents (products) that can either be structured as:
A single key-value pair, for example: Tags: red, spotty, nylon, casual
Or multiple key-value pairs with each tag title, for example: Color: red Design: spotty Material: nylon Style: casual
Will one of these data structures perform better than the other in vector search? Or is the difference likely to be negligible?
If these are all tensor fields (used during vector search) here are some things to be aware of.
Tags: "blue, patterned, cotton, elegant"
Vector search performs better when the model that is creating the embedding has some context. So an embedding generated from the string
"blue, patterned, cotton, elegant"
will likely have better recall performance than 4 separate embeddings that are each generated from a single word. So for recall, speed and resource performance the first option will work better in most cases.