Design - Best practice to ingest data from different sources with different interfaces

Question

Design - Best practice to ingest data from different sources with different interfaces

190 views Asked by giulio di zio At 11 October 2022 at 13:12

PROBLEM DESCRIPTION

Hello, I would like to implement a service that receives data from various providers and dumps it into a database (a sort of raw data store).
The issue is that the providers have different ways to give me the data I need. Some stream the data with a RabbitMQ exchange, others give me access to an API I can pull form, and others simply share csv files.
I believe this is a common scenario for those who need a lot of data and have different providers.

QUESTION

Can somebody point me to the right steps I should take in the designing phase of this data ingestion pipeline in order to make it as scalable and maintainable as possible? Maybe some known design patterns or anything that can come in handy for these fairly common scenarios.

Original Q&A

There are 2 answers

**kladderradatsch** · Answer 1 · 2022-10-11T18:21:50+00:00

One approach of many might be:

What came first into my mind was an integration framework like Apache Camel or Spring Integration. To make it practical, implement for each data source a route (RabbitMQ consumer, file consumer, http producer, etc.) and shape/map/transform the data afterwards so they can be stored into the database which is the last step in your route. That's quite easy this way.

**StepUp** · Answer 2 · 2022-10-12T05:03:35+00:00

If you have different interchangeable implementations of data providers, then this s is a place where Strategy pattern can be used:

Strategy pattern is a behavioral software design pattern that enables selecting an algorithm at runtime. Instead of implementing a single algorithm directly, code receives run-time instructions as to which in a family of algorithms to use.

Let me show an example.

We need to have some common behaviour that will be shared across all strategies. In our case, it would be just one Get() method from different data providers:

public interface IDataProvider
{
    string Get();
}

And its concrete implementations. These are exchangeable strategies:

public class RabbitMQDataProvider : IDataProvider
{
    public string Get()
    {
        return "I am RabbitMQDataProvider";
    }
}

public class ApiDataProvider : IDataProvider
{
    public string Get()
    {
        return "I am ApiDataProvider";
    }
}

public class CsvDataProvider : IDataProvider
{
    public string Get()
    {
        return "I am CsvDataProvider";
    }
}

We need a place where all strategies can be stored. And we should be able to get necessary strategy from this store. So this is a place where simple factory can be used. Simple factory is not Factory method pattern and not Abstract factory.

public enum DataProviderType
{
    RabbitMq, Api, Csv
}

public class DataProviderFactory
{
    private Dictionary<DataProviderType, IDataProvider> _dataProviderByType
        = new Dictionary<DataProviderType, IDataProvider>()
        {
            { DataProviderType.RabbitMq, new RabbitMQDataProvider() },
            { DataProviderType.Api, new ApiDataProvider() },
            { DataProviderType.Csv, new CsvDataProvider() },
        };

    public IDataProvider GetInstanceByType(DataProviderType dataProviderType) =>
        _dataProviderByType[dataProviderType];
}

and then you can get instance of desired storage easier:

DataProviderFactory dataProviderFactory = new();
IDataProvider dataProvider = dataProviderFactory
    .GetInstanceByType(DataProviderType.Api);
string data = dataProvider.Get();

This design is compliant with the open/closed principle. So if you would need to add other storages, then:

you would add new class with new strategy
you will not edit StorageService class

And it is compliant with open closed principle.

TechQA.

Design - Best practice to ingest data from different sources with different interfaces

PROBLEM DESCRIPTION

QUESTION

There are 2 answers

Related Questions in DESIGN-PATTERNS

Related Questions in ETL

Related Questions in RAW-DATA

Related Questions in DATA-PIPELINE

Popular Questions

Trending Questions