The following key:value pairs are 'page' and 'page contents'.
{
'section-a.html':{'contents':'section-b.html section-c.html section-d.html'},
'section-b.html':{'contents':'section-d.html section-e.html'},
'section-c.html':{'contents':'product-a.html product-b.html product-c.html product-d.html'},
'section-d.html':{'contents':'product-a.html product-c.html'},
'section-e.html':{'contents':'product-b.html product-d.html'},
'product-a.html':{'contents':''},
'product-b.html':{'contents':''},
'product-c.html':{'contents':''},
'product-d.html':{'contents':''}
}
For any given 'item' how could I find the path(s) to said item? With my very limited knowledge of data structures in most cases, I'm assuming this would be a hierarchy tree. Please correct me if I'm wrong!
UPDATE: My apologies, I should have been more clear about the data and my expected outcome.
Assuming 'page-a' is an index, each 'page' is literally a page appearing on a website, where as each 'item' is something like a product page that would appear on Amazon, Newegg, etc.
Thus, my expected output for 'item-d' would be a path (or paths) to that item. For example (delimiter is arbitrary, for illustration here): item-d has the following paths:
page-a > page-b > page-e > item-d
page-a > page-c > item-d
UPDATE2: Updated my original dict
to provide more accurate and real data. '.html' added for clarification.
Here's a simple approach -- it's O(N squared), so, not all that highly scalable, but will serve you well for a reasonable book size (if you have, say, millions of pages, you need to be thinking about a very different and less simple approach;-).
First, make a more usable dict, mapping page to set of contents: e.g., if the original dict is
d
, make another dictmud
as:Then, make the dict mapping each page to its parent pages:
Here, I'm using lists of parent pages (sets would be fine too), but that's OK for pages with 0 or 1 parents as in your example, too -- you'll just be using an empty list to mean "no parent", else a list with the parent as the one and only item. This should be an acyclic directed graph (if you're in doubt, you can check, of course, but I'm skipping that check).
Now, given a page, finding the paths up its parent(s) to a parentless-parent ("root page") simply require "walking" the
parent
dict. E.g., in the 0/1 parent case:If you can clarify your specs better (ranges of number of pages per book, number of parents per page, and so on), this code can no doubt be refined, but as a start I hope it can help.
Edit: as the OP clarified that cases with > 1 parent (and so, multiple paths) are indeed of interest, let me show how do deal with that:
Of course, instead of
print
ing, you canyield
each path when it reaches a root (making the function whose body this is into a generator), or otherwise treat it in whatever way you require.Edit again: a commenter is worried about cycles in the graph. If that worry's warranted, it's not hard to keep track of nodes already seen in a path and detect and warn about any cycles. Fastest is to keep a set alongside each list representing a partial path (we need the list for ordering, but checking for membership is O(1) in sets vs O(N) in lists):
It's probably worthwhile, for clarity, packing the list and set representing a partial path into a small utility class Path with suitable methods.