We have a couple of MySQL tables with lft
/rgt
(nested set/modified preorder tree traversal), parent_id
and level
columns in them. Every night we wipe the tables and fetch new data from our client. Because of the way we fetch the data from our client's db we can easily calculate parent_id
and level
, but lgt
and rgt
are set after our sync with a stored procedure. Up until now we've only had to deal with quite small result sets, not more than 30,000 rows. But now we're looking at over 200,000 of em, and it's taking forever. I've had the sync running for well over an hour and it's still not done, normally it would take about 5-15 minutes (which I also feel is a bit much).
Is there an alternative way to calculate lft
/rgt
before storing in db? (preferrably with python)
Some pseudo-code of what our sync looks:
class Node(object):
def __init__(self, id, data=None):
self.id = id
self.children = []
self.data = data
self.parent = None
def add_child(self, child):
self.children.append(child)
child.parent = self
def sync(source_sql_result):
node_map = {}
current_id = 0
parent_node = None
for source_row in source_sql_result:
for i, row in enumerate(get_subrows(source_row), 1):
try:
parent_node = node_map[row['identifier']]
except KeyError:
# New node found
current_id += 1
row['level'] = i
node = Node(id=current_id, data=row)
if parent_node is not None:
parent_node.add_child(node)
node_map[row['identifier']] = node
parent_node = node
for node in node_map.values():
yield node.data