I'm trying to model as a graph the flow of inventory within a given time period. the information is either stored in a RDBMS or CSV files.
A representation of what I'm trying to accomplish is converting the following table:
Product FromLoc ToLoc Qty TransactionType TransactionTime
A Loc1 Loc2 10 Move 1/1/2017 10:00
A Loc0 Loc2 15 Move 1/1/2017 11:00
A Loc2 Loc3 25 Move 1/1/2017 12:00
A Loc3 5 Scrap 1/1/2017 14:00
A Loc3 Loc4 20 Move 1/1/2017 16:00
To something like this: I've trying to do this using neo4j but I'm new to this business.
Suggestions are welcome!
Thanks.
When modeling for a graph db, it usually makes sense to model the important entities in your data model as nodes. This is especially helpful if you plan to lookup information by properties, as indexes and unique constraints apply to label/properties on nodes and allow fast lookup.
In your current model, you have important data stored in relationships (transaction type, product type, product amount), and depending on the type of queries you want to make on this data, you may have trouble querying it in this form. For example, if you wanted to make a query for all transactions that happened within a span of time for all products. Since relationships and their properties are not indexed, all relationships would have to be searched, and the query would get progressively less efficient as the number of relationships increase.
Ultimately it comes down to what kinds of queries you plan on making on your data. If your proposed model allows you to make these queries efficiently (even with a large data set) and easily, and you think that future queries you plan on writing will also be efficient and easy, then it's fine as is.
As a possible alternate model, consider modeling transactions not as relationships, but as :Transaction nodes (which you could additionally label with the transaction type to speed up queries restricted to only transactions of certain types).
A :Transaction node might have the following properties:
It could have relationships to :Location nodes like so:
Choosing the right relationship types can be tricky here. On one hand, you want the relationships to be semantically correct. But also, in your case, modeling the flow of products, and being able to visualize that, is important. If we were to use :From and :To relationships instead, then the semantically correct way to model them would be always outward from :Transactions, and always inward to :Locations. While there's nothing wrong with that, and your queries would be fine, visualization of the graph might be confusing, as you can't readily see the flow of products by looking at relationship direction alone (as :From and :To relationships would always be incoming to the :Location). It's up to you how to model this. My suggestion above for :Txn_Out and :Txn_To is the best I could think of that preserves both semantics and keeps the relationship direction flowing one way.
Another advantage of using nodes for :Transactions is in cases of scrap transactions, or any transaction where the product will not flow on to any other node in your graph. Neo4j does not allow dangling relationships, a relationship must connect two existing nodes, so you can't model a scrap transaction as in your diagram. If you had a :Transaction:Scrap node, all you would need is a :Txn_Out relationship to this node from the relevant location, and no need for a :Txn_To relationship anywhere else.
If you want to model more data for each product, I recommend creating :Product nodes to hold those properties and attaching them to the :Transactions which involve those products:
Also, regarding your date/time data, Neo4j models time in milliseconds UTC (such as with the timestamp() function), and doesn't have good support for parsing or formatting from and to string representations. I highly encourage installing the APOC Procedures Library, which has support for Date/Time operations as well as many other useful functions and procedures.