I want to perform vector search using generativ ai embedding models. I am able to do it on normal text, here is the code
# Load the text embeddings model
from vertexai.preview.language_models import TextEmbeddingModel
model = TextEmbeddingModel.from_pretrained("textembedding-gecko@001")
# get embeddings for a list of texts
BATCH_SIZE = 5
def get_embeddings_wrapper(texts):
embs = []
for i in tqdm.tqdm(range(0, len(texts), BATCH_SIZE)):
time.sleep(1) # to avoid the quota error
result = model.get_embeddings(texts[i:i + BATCH_SIZE])
embs = embs + [e.values for e in result]
return embs
df = df.assign(embedding=get_embeddings_wrapper(df.title))
But suppose the data is in tabular format in csv file or bigquery table which is like this
I want to perform embedding on data and then perform vector search so that I can fetch the records for query as below. ( Like fetching best matching top 3 results, and giving it to LLM to generate the answer )
- Get Manager ID for job name manager
- Can you provide me job name for Jonas ?
I got good understanding using https://github.com/GoogleCloudPlatform/generative-ai/blob/main/vector-search/intro-textemb-vectorsearch.ipynb
But it can not be used for tabular data. Let me know if any reference or suggestion how to do this