Similarity search within vector database records

18 views Asked by At

Reading about vector databases and embeddings used to populate those databases I want to ask, whether there is some recommended solution to find similarities inside vector database records.

I expect it being useful for example in case when we first populate vector database with existing strings and we don't know whether within this set of texts there is any similarity.

All of the current solutions I could find simply suggest first populating vector database and then one more time generate embeddings this time to compare each record with whole database. Which is waste of money and time to generate same thing 2 times.

My thought so far was to have 2 DB - one plain text (PTDB) and another vector (VDB). PTDB stores plain test strings and we use those to generate embeddings which are then stored inside VDB. When we store it there we record id of vector within VDB and store it next to text that it corresponds to in PTDB.

When we want to check similarities within our strings we iterate over PTDB, retrieve id of vectors and use those as an argument for search in VDB.

On the paper it looks good enough, but I cannot find any such thing existing.

Could someone please comment, whether such thing is feasible or there are better solutions for such use-cases?

0

There are 0 answers