How can implement data mesh concept data engineering product or application

570 views Asked by At

I am trying to implement data mesh concept in a business related application. Let me describe first:

Already used data HDFS, hive and cassandra_database to manage data.

1: According to my knowledge, in data mesh concept multiple databases, on-premise data, data lake and data warehouses are connected in a single point, distributed those data. here each data warehouses, data lakes or databases are one one NODE for data mesh. Is this overall concept correct for data mesh ?

2: How to implement in my project,i am trying with graphDb database because it support cluster connection to another database as master and worker node(repository).

3: Can i check with another platform, other than graphDb. like neo4j, is it possible ?

Anyone can help to implement data mesh technology in my project or any reference to implement.

3

There are 3 answers

0
whatsinthename On

AFAIK, the data mesh concept is to decentralize the data warehouses and data lakes into multiple domains. So, you alone can't do everything. You need governance policies too. It's not like deploying everything under a single node. It works in a distributed manner too. You need to understand thoroughly.

0
Sean Martin On

If you are looking to construct a huge scale graph for analytics, take a look at AnzoGraph DB which is a massively parallel processing (MPP) graph data warehouse engine that achieves near-linear scale-up performance horizontally through the addition of additional commodity Intel servers. The architecture is nothing shared, so all data is automatically sharded across a cluster and every query is automatically decomposed into C++ programs that are run simultaneously in parallel on every cpu core.

AnzoGraph is optimized for OLAP style queries i.e. extremely fast parallel load, vast datasets, complex analytical queries, dynamic & materialized views, and excellent ELT performance required to iteratively clean, link & reshape graph data in database as needed. Unlike most OLAP and graph systems, the database is schemaless which allows immediate direct loading of (even dirty) source data without creating ETL pipelines and the target schema up-front or pre-forming source data into a graph prior to loading it. A virtual graph option (data virtualization/federated query) that allows you to optionally leave parts of the graph source data in the original source and only accessed as referenced through automatic push-down queries, is in preview. There is a free single-server edition.

Note that AnzoGraph is not designed for OLTP like Neo or Neptune.

Disclaimer: I work for Cambridge Semantics Inc.

0
Herk On

Whilst I was working at one of the largest healthcare companies in the world, we designed and built the world's largest healthcare "Mesh" DB that sat on top of our managed data warehouses.

When conceptualizing the database we projected to have 52TB of data in RAM in 3 years (back in 2018). After doing some research on Graph DB's on the market (Anzo, Neptune, Neo4j) we ended up going with TigerGraph for Speed and Scale. TigerGraph would allow you to scale horizontally (adding more machines to create a larger cluster)

If you would like some resources on Getting Started: https://community.tigergraph.com/t/tigergraph-getting-started-guide/11

If you would like a free sandbox environment to play around: https://tgcloud.io