Sort direct neighbor nodes (books) by attribute of 2nd degree neighbors (authors) for user book list?

79 views Asked by At

By agheranimesh via Slack:

This is my graph, named LibraryGraph:

LibraryGraph

My graph query:

FOR v, e, p IN 1..2 OUTBOUND "User/001" GRAPH "LibraryGraph"
   SORT p.vertices[2].Name
   RETURN  p.vertices[1]

It's not giving me result I want. I want a book list sorted by author name and books without author should come last (B2, B3, B1, B4, B5).

Script to re-create the data (arangosh --javascript.execute <file>):

db._createDatabase('Library')
db._useDatabase('Library')

const User = db._create('User')
const Book = db._create('Book')
const Author = db._create('Author')

const User_Book = db._createEdgeCollection('User_Book')
const Book_Author = db._createEdgeCollection('Book_Author')

User.save({ '_key': '001', 'UserName': 'U1' })

Book.save({ '_key': 'B1', 'Name': 'B1' })
Book.save({ '_key': 'B2', 'Name': 'B2' })
Book.save({ '_key': 'B3', 'Name': 'B3' })
Book.save({ '_key': 'B4', 'Name': 'B4' })
Book.save({ '_key': 'B5', 'Name': 'B5' })

Author.save({ '_key': 'A', 'Name': 'A' })
Author.save({ '_key': 'B', 'Name': 'B' })
Author.save({ '_key': 'X', 'Name': 'X' })
Author.save({ '_key': 'Y', 'Name': 'Y' })
Author.save({ '_key': 'Z', 'Name': 'Z' })

User_Book.save({ '_from': 'User/001', '_to': 'Book/B1' })
User_Book.save({ '_from': 'User/001', '_to': 'Book/B2' })
User_Book.save({ '_from': 'User/001', '_to': 'Book/B3' })
User_Book.save({ '_from': 'User/001', '_to': 'Book/B4' })
User_Book.save({ '_from': 'User/001', '_to': 'Book/B5' })

Book_Author.save({ '_from': 'Book/B2', '_to': 'Author/A' })
Book_Author.save({ '_from': 'Book/B3', '_to': 'Author/B' })
Book_Author.save({ '_from': 'Book/B1', '_to': 'Author/X' })
Book_Author.save({ '_from': 'Book/B1', '_to': 'Author/Y' })
Book_Author.save({ '_from': 'Book/B1', '_to': 'Author/Z' })

const graph_module =  require('org/arangodb/general-graph')
const graph = graph_module._create('LibraryGraph')

graph._addVertexCollection('User')
graph._addVertexCollection('Book')
graph._addVertexCollection('Author')

graph._extendEdgeDefinitions(graph_module._relation('User_Book', ['User'], ['Book']))
graph._extendEdgeDefinitions(graph_module._relation('Book_Author', ['Book'], ['Author']))
1

There are 1 answers

0
CodeManX On BEST ANSWER

Instead of a single traversal with variable depth (1..2) to cover both cases, books with and without authors, I suggest to use two traversals:

FOR book IN OUTBOUND "User/001" GRAPH "LibraryGraph"
    LET author = FIRST(
        FOR author IN OUTBOUND book._id GRAPH "LibraryGraph"
            SORT author.Name
            LIMIT 1
            RETURN author.Name
    ) OR "\uFFFF"
    SORT author
    RETURN book

First we traverse from User/001 to the linked books. Then we do a second traversal from each book to the linked authors. This may return 0, 1 or multiple authors. The sub-query caps the result to the alphabetically first author (e.g. X out of X, Y, Z) and returns the name.

In the scope of the main query, we take the author name or fallback to a value that ends up last if sorted (null would end up first, which is not desired here). Then we sort the books by author name and return them:

Query result

Another way to achieve this result, yet harder to understand:

FOR v, e, p IN 1..2 OUTBOUND "User/001" GRAPH "LibraryGraph"
    LET name = p.vertices[2].Name OR "\uFFFF"
    COLLECT book = p.vertices[1] AGGREGATE author = MIN(name)
    SORT author
    RETURN book

The traversal returns paths with 2 or 3 vertices...

   [0]         [1]         [2]
User/001 --> Book/B2
User/001 --> Book/B2 --> Author/A
User/001 --> Book/B3
User/001 --> Book/B3 --> Author/B
User/001 --> Book/B4
User/001 --> Book/B5
User/001 --> Book/B1
User/001 --> Book/B1 --> Author/Y
User/001 --> Book/B1 --> Author/X
User/001 --> Book/B1 --> Author/Z

The authors at index 2 (p.vertices[2]) or a fallback value is temporarily stored in a variable name. Then the book vertices are grouped together to eliminate duplicates (caused by the variable traversal depth, which returns e.g. 001-->B2 but also the longer path 001-->B2-->A).

Aggregation is used to pick the author name with the lowest value (MIN), which usually means the alphabetically first - it probably doesn't work correctly for some languages and character sets however, whereas SORT does sort correctly based on the rules of the set language (can only be one per DBMS instance).

The grouping result - distinct book documents - is sorted by author names and returned.