Data retrieval and search accross multiple services

45 views Asked by At

I'm building a system that comprises a multiple heterogeneous services that talk to each other over a network, although in the standard deployment model they are all on the same machine. The UI client for managing the entities within that complex system should be able to display aggregated data from all comprising services while enabling search across that aggregated data.

I'm wondering how to design the data retrieval within this system so that it is scalable as the amount of data to be searched is already high and increases?

I'm thinking about two approaches:

  1. The client queries data from all services on demand and aggregates the results in its layer. In many cases it will have to do joins between data coming from multiple services, so I'm concerned about performance here.

  2. Denormalize the services data in a way so that it is convenient for the client queries and even store aggregations between the multiple services data so that the client doesn't have to do joins on demand. Probably, it would be better to store each service's denormalized data in its own database or cache as thus it would be easier to keep all denormalized data up-to-date. However, I'll need to put the aggregated views across multiple services' data in some other place and I'm concerned about the overhead of keeping this remote cache up-to-date.

Any examples or references to existing architectures that solve similar problems would be highly appreciated. Thanks!

1

There are 1 answers

0
Pellared On

Having an aggregated cache would surely can have better performance but think carefully about be the cost - the synchronization. It will end up that your client (or some remote service that will do this job for the clients) has its own database that synchronizes with the service data (something like implementing own database asynchronous pull replication). Check how the data retrieved from the services can change. The best for you would be if the data is not deleted/modified and only new can be added. It would be also easier if the data do not have to be consistent. Choosing appropriate synchronization mechanism depends on existing architecture and requirements.