Massive data operations in the stored proc to DDD

1.7k views Asked by At

Lets take an example of a product classification. All the products needs to be classified as vegetable or not. The business logic is, the product can be classified as vegetable if that product is from company A, B & C. If the product is not from those companies they are not vegetables. There are millions of products. This can be done in a stored proc with few lines of code. The operation may take only few seconds if it is done synchronizely.

As I understand, the DDD goes against the idea of putting the logic in the stored procedure. The logic can be put as a behavior on product which can self classify based on who is the source. To do this, all the million products need to be read into memory, process and then save it back to the database.

The problem here is the large amount of memory this operation needs. If the operation is done in chucks like 50,000 the repository has to first figure out how may products needs to be classified and should tell the domain the long running operation has to go in chunk. Surely, this approach is going to take more time and a bad user experience for the user who has to wait more time than a process than a stored procedure takes.

What is the reasonable approach to DDD when it comes to long running processes? Is the delay expected, so the app has to inform the user that the classification is going to take time and will let the user know when that is complete? And should not use stored procedure, but have the logic part of the domain.

UPDATE

Just to add some clarity, this classification process is done quite often. The application has to support the classification process, not an ETL or can't wait longer. That's why I'm trying to find the trade offs between using a stored procedure versus DDD.

Also note that it is not a Query, but a Command. The command can be called ClassifyAllProductsCommand(). When this command is run, there was no classification before. After the classification, other users of the system should see the new classification. For example, the product A is classified as Unavailable, and after the classification it can be Vegetable or Meat.

3

There are 3 answers

2
Adrian Thompson Phillips On

I think you're confusing DDD. If you were looking for Vegetable type Products, you would call a service that would retrieve Products for a particular Company. There would be no need to load all the products into memory.

Application or domain-centric design, just means designing your application around the business domain and not from a collection of database tables upwards (like a data-centric approach).

In contrast you end up with more data associations (joins) being done in your application and less in monolithic stored procedures. Which moves all your business logic into the application and not in the persistence device (the database), which kinda makes a lot of sense.

Also, if you deny yourself huge table joins then you also think carefully about things that traditionally cause massive overhead on your database and end up moving towards better design, like creating a separate reporting database, message buses, asynchronous tasks, etc.

EDIT

It seems like a common phrase in DDD but "it depends on your specific domain".

Without knowing the detail, I would want to know how often these classifications occur. Can they be done as the Products are created? Are they done often or rarely, planned or unpredictably?

If the classifications are common and must be done across all one million products, it might be best to create a smaller model for the Product, maybe something with just SmallProduct.Id and SmallProduct.CompanyId (probably naming it something better). Then data cache this smaller collection in memory and perform operations against it.

If the check to see if the product is a Vegetable is common and only one of a few possible classifications, it might be best to have Classifications in their own table and a linking table to link them to Products. Then the problem becomes more of a one time data setup issue.

On the rare chance that you're using a Document Database, you could just store these classifications in a collection on the Product object itself.

0
Eben Roux On

Classification is an interesting thing. It is a separate thing. Classification should never be implemented as structure... but that is another story :)

Your classification may even be regarded as a bounded context in the same way that reporting may be a bounded context. As such you may wish to handle classification separately. Your classification is not an aggregate root. It plays an auxiliary role. If it has no impact on the consistency in your domain modelling it may not even necessarily be part of your Product aggregate. It may be added and it may even be changed independently (not as bulk) but if it is used to determine the validity of your aggregate then your classification sub-system is going to have to take that into account.

Please bear in mind that it isn't a matter of DDD vs a stored procedure. You are executing queries against your data store. Whether that is done via a stored procedure or dynamically should not affect your decision. There is nothing preventing, say, a ProductRepository from calling a stored procedure.

You can have your classification sub-system still execute your SP or use DML directly. However, this isn't necessarily going to be part of your domain. You most certainly do not want to classify each product individually if it is something that happens quite often and as a bulk operation. If your current design dictates that these are bulk operations then keep them as such and don't force them into a DDD structure that is going to be prohibitive.

It is a design choice and sometimes making changes to individual items does not make sense. It should certainly be your aim to work on a single aggregate at a time but things like reporting or classification are another animal that don't always fit cleanly into the Domain-Driven Design thinking.

0
Fabio Marreco On

It seams you are interpreting "classification" as you aggregate root, containing products (as entities).

Honestly, it does not feel like a good design decision (I might be wrong, depends on the requirements specifics).

What if you think of the product as aggregate root (containing suppliers, discounts, etc.)?. In that case, you´ll need to load only one product at a time.

If the classification/supplier has a complex domain, you should consider having a separate bounded context for that.

Also, in your comment:

Just to add some clarity, this classification process is done quite often. The application has to support the classification process, not an ETL or can't wait longer. That's why I'm trying to find the trade offs between using a stored procedure versus DDD.

REALLY? You can´t fire an event and have the product service update the classification when the there´s an update on the supplier? The user will have an inconsistent state (say.. "undefined" category"), for a few seconds/minutes. It is not that bad, is it ?

But, if you are talking about a batch job, then, by all means, go with the stored procedure.