Context: I have an application that allows the user to process all the mailed-in payments received that day. Sometimes an envelope may include multiple checks for the same account (think two roommates each paying their portion of a utility bill).
Restrictions: Process all the payments in batches of 10 but the Account ID must be unique per batch.
Very simplified Payment class:
public class Payment
{
public int AccountId { get; set; }
// ... other properties not important
}
A hypothetical collection of payments received in the mail today. Notice that the last two AccountId values are acceptable duplicates:
List<Payment> payments = new List<Payment>()
{
new Payment() {AccountId = 1 },
new Payment() {AccountId = 2 },
new Payment() {AccountId = 3 },
new Payment() {AccountId = 4 },
new Payment() {AccountId = 5 },
new Payment() {AccountId = 1 }, // Duplicate Account
new Payment() {AccountId = 2 } // Duplicate Account
// likely hundreds more unique accounts, possibly even some more duplicates...
};
I'm using MoreLinq to try to select distinct accounts per batch but this code below is clearly not going to work. I feel like I'm close but have been unable to find a working solution. Again, the goal is to split all the payments into batches of N without duplicating the AccountId in that batch. Duplicate AccountIds must be spread across other batches so they don't cause a race condition when trying to update the customer's balance.
Edited code comments for clarity.
int batchSize = 10;
var paymentTasks = new List<Task>(batchSize);
// This linq expression is the heart of my question: How to divide the payments
// into batches while ensuring uniqueness of a particular key(s). This expression
// is close, but the DistinctBy() is obviously excluding the duplicates that
// I just intend to be distinct for that Batch(batchSize).
foreach (IEnumerable<Payment> batchOfPayments in payments.DistinctBy(a => a.AccountId).Batch(batchSize))
{
// The rest of this method is for context only
paymentTasks.Clear();
foreach (Payment payment in batchOfPayments)
{
// Async method implementation not important
Task paymentTask = ProcessPaymentAsync(payment);
paymentTasks.Add(paymentTask);
}
// Await all the tasks in this batch to complete before starting the next batch
await Task.WhenAll(paymentTasks);
}
Thank you for your time and for looking at my question.
If I completely understand the problem, then there are many ways to do this and the best solution would depend on your actual needs.
The assumptions are :
Then a very simple (yet efficient) batch and queue pattern can be used with minimal allocations.
Given
And
Note : This is fairly resource efficient, though it does probably have an extra unnecessary small allocation when dealing with pure leftovers, I am sure this could be removed, though I'll leave that up to you. There are also many efficiencies you could add with the use of a channel could easily be turned into a consumer
Test
Output
Full demo here to Play with