Graph databases for modeling specific domain

299 views Asked by At

With a normal 'graph database' the data is broken up into nodes and edges, and there isn't much of a restriction/schema between the connections. With this, it seems great for modeling straightforward graphs where the relationships are relatively consistent -- Movies with cast and crew; Computer networks with IPs and devices; Social networks with users and connections; etc.

Are there any graph-like databases that can be more specialized? For example to be able to model something like an electrical circuit where each component has a sort of 'schema' or well defined input and output -- i.e., a Resistor has two connections and has various properties:

enter image description herea Transistor takes has three connections and has various properties, etc.

I'm not asking about particular circuit simulators, such as https://www.falstad.com/circuit/circuitjs.html, but more about whether it's possible in any graph (or pseudo-graph) databases to model and enforce very specific, well-defined relationships in a network, such as circuit design.

3

There are 3 answers

2
James Fleming On

Definitely possible.

I've been working on this problem with Neo4j, and Restagraph is the result. It provides a REST API that enforces a schema on any updates to the database, and I've packaged it as a Docker image.

I haven't really promoted it so far, because it's only recently been mature enough for my own use, and I really need to improve the documentation. If you try it out, though, I'd love to hear any feedback you have.

4
djhallx On

Objectivity/DB is object/graph database that uses schema. You can absolutely do what you are proposing. It supports complex object definitions including type inheritance and it has a full graph/navigational query language similar to Cypher. www.objectivity.com

0
Lars On

TLDR: in general yes, but it depends.

This is a really broad question, so let me break it down.

While it's a little exaggerating to talk about all graph databases (which are not as standardized as SQL databases - which in turn are not very standardized as well), so take this answer with a grain of salt: Yes, that is possible.

As in SQL databases, you usually can set up constraints to be checked before any changes in data is persisted.

Most graph databases incorporate something along the lines of a "type", similarly to what a table represents in SQL databases. Some allow to constrain relationships to only target specific types, so you could restrict relationships e.g. between a node using a CAN bus and an I2C-bus to the specific types.

If a database does not provide these mechanisms, it's usually possible to constrain relationships to the existence of specific keys and values in the model. To have another example than your circuit one: Imagine a node-based system, which has typed inputs and outputs - an int-based output can only be connected to an int based input, a float based output only to a float based input, etc. Then you could add a field output_type and input_type to the nodes and constrain relationships between the values.

As soon as you add the ability to write (the SQL-similar stored) procedures, you can write very complex data integrity constraints.

So, while it is possible, the question is, if you should.

How much logic you actually want to put into your database is a decades-long heated argument. At some point in your application architecture, you will have to check the validity of the data that you are handling. Handling the data consistency in the database itself solves a lot of problems with race conditions or performance issues through multiple round trips between the application and the database, which would occur if the consistency checks are done in the application layer.

Putting a lot of your logic into the database severely limits your ability to switch databases ("vendor lock-in"), might lead to code duplication between your application layer and your database, and sprays your logic between two (or more) layers of your architecture (which makes it harder to find bugs, introduces temporal coupling, and might re-introduce race conditions and performance problems where you have to use transactions again).

My personal take is along the lines of Steve Wozniak - see your database as another service. If that service can provide you with everything you need to ensure data integrity, it might be a good idea to just use the database directly. But if this increases the problems I mentioned before, you might be better off putting a layer between your database and your business logic.