We have a couple of MySQL tables with lft/rgt (nested set/modified preorder tree traversal), parent_id and level columns in them. Every night we wipe the tables and fetch new data from our client. Because of the way we fetch the data from our client's db we can easily calculate parent_id and level, but lgt and rgt are set after our sync with a stored procedure. Up until now we've only had to deal with quite small result sets, not more than 30,000 rows. But now we're looking at over 200,000 of em, and it's taking forever. I've had the sync running for well over an hour and it's still not done, normally it would take about 5-15 minutes (which I also feel is a bit much).
Is there an alternative way to calculate lft/rgt before storing in db? (preferrably with python)
Some pseudo-code of what our sync looks:
class Node(object):
def __init__(self, id, data=None):
self.id = id
self.children = []
self.data = data
self.parent = None
def add_child(self, child):
self.children.append(child)
child.parent = self
def sync(source_sql_result):
node_map = {}
current_id = 0
parent_node = None
for source_row in source_sql_result:
for i, row in enumerate(get_subrows(source_row), 1):
try:
parent_node = node_map[row['identifier']]
except KeyError:
# New node found
current_id += 1
row['level'] = i
node = Node(id=current_id, data=row)
if parent_node is not None:
parent_node.add_child(node)
node_map[row['identifier']] = node
parent_node = node
for node in node_map.values():
yield node.data