I am working on a FastAPI solution with Python to store OpenAI vector embeddings in a Postgres Database using pgvector. I am using SQLAlchemy for the connection to the database.

When I try to post data to the database I get the following error:

sqlalchemy.orm.exc.UnmappedInstanceError: Class 'minimalExample.EmbeddingCreate' is not mapped

The embedding data is not being written to the database.

When I try to send a vector, like for example [-0.019, -0.017, -0.007] (I know that the OpenAI vector size is different, like 1536 with ada-002 embedding, but this is just a minimum example), to the defined post route, I get the above mentioned error. I used the following method to post the embedding vector:

import requests

emb = {
    "embedding" : [-0.019, -0.017, -0.007]
}

r = requests.post("http://localhost:8000/embeddings/", json=emb)

My expectation is that the data is either being written to the database and I get a 200 status back or if the data is already existing, the data is not being written and I get a 400 http status in return.

Here is a very minimum example of my FastAPI code which is also creating the same error:

minimalExample.py :

from sqlalchemy import Column, Integer, create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base
from pgvector.sqlalchemy import Vector
from pydantic import BaseModel

from typing import List

from fastapi import FastAPI, Depends, HTTPException, status, Request
from fastapi.security import OAuth2PasswordBearer, OAuth2PasswordRequestForm
import os
from pydantic import BaseModel
from api.api_router import router as api_router
from sqlalchemy.orm import Session

app = FastAPI()
Base = declarative_base()

DATABASE_URL = "postgresql://postgres:admin@localhost:5434/vector_db" #5434 because I already have other servers running on 5432 and 5433
engine = create_engine(DATABASE_URL)
SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)


class Embedding(Base):
    __tablename__ = 'embeddings'
    embedding_id = Column(Integer, primary_key=True, autoincrement=True)
    embedding = Column(Vector(3), nullable=False)
    
class EmbeddingCreate(BaseModel):
    embedding: List[float]

def get_db():
    db = SessionLocal()
    try:
        yield db
    finally:
        db.close()
        
@app.post("/embeddings/")
async def create_embedding(request: Request, db: Session = Depends(get_db)):
    embedding_data = await request.json()
    embedding = EmbeddingCreate(**embedding_data)
    
    existing_embedding = db.query(Embedding).filter(
        Embedding.embedding == embedding.embedding,
    ).first()
    if existing_embedding:
        raise HTTPException(status_code=400, detail="Embedding already exists")
    
    db.add(embedding)
    db.commit()
    db.refresh(embedding)
    
    new_embedding = db.query(Embedding).filter(Embedding.embedding == embedding.embedding).first()
        
    return {"status": "Embedding added successfully", "status_code": 200, "embedding_id": new_embedding.embedding_id}

For the PostgreSQL Database I am using the ankane/pgvector Docker image from Dockerhub. I made sure that the vector extension has been created:

CREATE EXTENSION vector;

And within the Database named "vector_db" I created the following table:

CREATE TABLE embeddings (
    embedding_id bigserial PRIMARY KEY,
    embedding vector(1536) NOT NULL
);

If you want to run this example you can use (assuming you are in the same folder as the minimalExample.py file): uvicorn minimalExample:app --reload

I assume that my issue is in the Embedding class, where I defined the embedding: embedding = Column(Vector(3), nullable=False) Because here the ORM mapping happens. But I have no clue anymore, what I could do different here.

0

There are 0 answers