Vector search using Gecko generative ai model on dataframe

218 views Asked by At

I want to perform vector search using generativ ai embedding models. I am able to do it on normal text, here is the code

# Load the text embeddings model
from vertexai.preview.language_models import TextEmbeddingModel
model = TextEmbeddingModel.from_pretrained("textembedding-gecko@001")

# get embeddings for a list of texts
BATCH_SIZE = 5
def get_embeddings_wrapper(texts):

  embs = []
  for i in tqdm.tqdm(range(0, len(texts), BATCH_SIZE)):
    time.sleep(1) # to avoid the quota error
    result = model.get_embeddings(texts[i:i + BATCH_SIZE])
    embs = embs + [e.values for e in result]
  return embs

df = df.assign(embedding=get_embeddings_wrapper(df.title))

But suppose the data is in tabular format in csv file or bigquery table which is like this

enter image description here

I want to perform embedding on data and then perform vector search so that I can fetch the records for query as below. ( Like fetching best matching top 3 results, and giving it to LLM to generate the answer )

  1. Get Manager ID for job name manager
  2. Can you provide me job name for Jonas ?

I got good understanding using https://github.com/GoogleCloudPlatform/generative-ai/blob/main/vector-search/intro-textemb-vectorsearch.ipynb

But it can not be used for tabular data. Let me know if any reference or suggestion how to do this

0

There are 0 answers