I have just started looking into Neo4J to track in house developed applications that use SQL Server to what tables and columns those applications use.
[Background] We are currently re-architecting a few of our Monolithic applications into microservice based applications. These monoliths all share a common database and that database will be the source of truth until this process is complete. We can see that as this effort grows we will need to track which applications are using which tables and specifically which columns in those tables.
What we need to know, is this. "If one app needs to modify a column in a table, what other microservices need to be evaluated for possible negative impacts of those changes?" (yes) I know that a 'true' microservice would maintain it's own source of truth and reference other services to gain access to data not under their control. This project is to track that data during the conversion process. Basically Day 1 we will not magically turn on a few dozen services all with their own data sources.
[Question] My question is this.
I have three node types (Table, Column, App). I have mapped a relationship between Column -> Table and from App - > Column, but should I map a relationship from App - > Table as I can traverse the graph like this App --> Column --> Table
Look at the diagrams to get a visual as to what I am referring to.
I have included the Cypher code to create the graph.
I know that in the overall scheme of things I am basically asking about THREE (3) relationships (in my sample), so why even bother, as those relationships pail in comparison to the number of column relationships (just add them already).
The reason for the ask is more about correctness and learning best practices.
Thanks for the feedback.
Here is the Cypher Code used to create the sample graph
CREATE (Customers:Table {Name:'Customers'})
CREATE (CustomerId:Column {Name:'CustomerId'})
CREATE (FirstName:Column {Name:'FirstName' })
CREATE (LastName:Column {Name:'LastName' })
CREATE (Address1:Column {Name:'Address1' })
CREATE (City:Column {Name:'City' })
CREATE (State:Column {Name:'State' })
CREATE (Cellphone:Column {Name:'Cellphone' })
CREATE (eMail:Column {Name:'eMail' })
CREATE (App_1:App {Name:'Call Back system'})
CREATE (App_2:App {Name:'Support'})
CREATE (App_3:App {Name:'Marketing'})
CREATE (CustomerId)-[:CONTAINED_IN]->(Customers)
CREATE (FirstName)-[:CONTAINED_IN]->(Customers)
CREATE (LastName)-[:CONTAINED_IN]->(Customers)
CREATE (Address1)-[:CONTAINED_IN]->(Customers)
CREATE (City)-[:CONTAINED_IN]->(Customers)
CREATE (State)-[:CONTAINED_IN]->(Customers)
CREATE (Cellphone)-[:CONTAINED_IN]->(Customers)
CREATE (eMail)-[:CONTAINED_IN]->(Customers)
//App 1
CREATE (App_1)-[:REFERENCES]->(Customers)
CREATE (App_1)-[:USES]->(CustomerId)
CREATE (App_1)-[:USES]->(FirstName)
CREATE (App_1)-[:USES]->(LastName)
CREATE (App_1)-[:USES]->(Cellphone)
//App 2
CREATE (App_2)-[:REFERENCES]->(Customers)
CREATE (App_2)-[:USES]->(CustomerId)
CREATE (App_2)-[:USES]->(FirstName)
CREATE (App_2)-[:USES]->(LastName)
CREATE (App_2)-[:USES]->(Address1)
CREATE (App_2)-[:USES]->(City)
CREATE (App_2)-[:USES]->(State)
//App 3
CREATE (App_3)-[:REFERENCES]->(Customers)
CREATE (App_3)-[:USES]->(CustomerId)
CREATE (App_3)-[:USES]->(FirstName)
CREATE (App_3)-[:USES]->(LastName)
CREATE (App_3)-[:USES]->(eMail)
;
You should not introduce redundant relationships unless you know for certain that they are necessary in your data model (e.g., profiling of important use cases has shown that the ability to do 1 hop instead of 2 makes a necessary performance improvement).
In general, redundancy should be avoided, and is not worth the additional effort to keep the associated relationships in sync and the additional storage requirement.