SQLAlchemy: Load directed graph in fewer queries

50 views Asked by At

I have a directed graph structure in my database model, and need to load many ( > 10.000) records. The database is not on the same server as the code, causing a network delay of about 2 ms per query.

My database model is similar to https://docs.sqlalchemy.org/en/20/_modules/examples/graphs/directed_graph.html

I want to load a list of nodes and for each node I need all edges.

This snippet is more ore less the code I would like to execute.

node_list = [1, 2, 3, ... , 1000]
for node in session.query(Node).filter(Node.id.in_(node_list)).all():
  print(f'{node} , {node.higher_neighbors()} , {node.lower_neighbors()}')

For 1000 records, this results in 1 query to load the 1000 nodes, and then 2 queries for each node to load the neighbors. (2N + 1) queries in total. With a cost of 2 ms per query, this code takes about 4 seconds more then nessesary when loading 1000 records.

How should I change the model and/or query to load this data in a single, or a few queries?

This is my actual code. I originally used an association proxy, but this makes no difference in the amount of queries compared to the SQLAlchemy graph example. A postgres database is used.

class Job(Base):
    __tablename__ = 'job'

    id: Mapped[int] = mapped_column(primary_key=True)
    job_name: Mapped[str] = mapped_column(String(256))

    parents = association_proxy("job_parents", "parent")
    children = association_proxy("job_children", "child")

    logs: Mapped[List[JobLog]] = relationship(order_by=TaskLog.id)

    def add_child(self, child, relation):
        JobRelation(parent=self, child=child, relation=relation)

    def add_parent(self, parent, relation):
        JobRelation(parent=parent, child=self, relation=relation)

class JobRelation(Base):
    __tablename__ = 'job_relation'
    parent_id: Mapped[int] = mapped_column(ForeignKey('job.id'), primary_key=True)
    child_id: Mapped[int] = mapped_column(ForeignKey('job.id'), primary_key=True)
    relation: Mapped[str] = mapped_column(String(256))

    parent = relationship(Job, foreign_keys=parent_id, backref="job_children")
    child = relationship(Job, foreign_keys=child_id, backref="job_parents")

    def __init__(self, parent: Job, child: Job, relation: str):
        self.parent = parent
        self.child = child
        self.relation = relation


job_list = json['job_list']
data = {}

for job in session.query(Job).filter(job.id.in_(job_list)).all():
    data[job.id] = {
        'children' = [child.id for child in job.children]
        'job_name' = job.job_name
        'logs' = job.logs
        'parents' = [parent.id for parent in job.parents]
0

There are 0 answers