I have a scenario, where in I have to fetch 18000 records from db, and convert it into a DTO. The total number of refIds = 18000 unique ids of the reference table in database.
int batchSize = 1000;
var refBatchList = new List<List<int>>();
for (var i = 0; i < refIds.Count; i += batchSize)
{
refBatchList.Add(refIds.Skip(i).Take(batchSize).ToList());
}
var dbDataRetrievalTasks = refBatchList.Select(async refIdsBatch => await GetReferencesAsync(refIdsBatch, new data.Entities()));
var results = await Task.WhenAll(dbDataRetrievalTasks);
The above section of code, creates multiple tasks which gets executed in parallel and calls the below method that retrieves data from db asynchronously.
private async Task<List<reference>> GetReferencesAsync(List<int> ids, JIVE.data.Entities db)
{
var result = await db.Set<reference>()
.Where(x => ids.Contains(x.reference_id))
.AsNoTracking()
.ToListAsync();
return result;
}
My question is, Since I am planning to make db calls in parallel, I am using multiple db connections. In these scenario, there would be 18 different db connections as there are 18 batches.
So I am wondering is it okay to retrieve data like this from database. what are the cons that are associated with this method? Is there any better way to achieve this.
I know if there more batches, tasks.whenAll() would create multiple tasks which could throttle the thread pool. But in my scenario the maximum batch that I could get is 20.
For larger sets of IDs, have a look at AsQueryableValues(set) (https://github.com/yv989c/BlazarTech.QueryableValues) which utilizes OPENJSON behind the scenes. This would allow you to reasonably retrieve the desired 18k rows in one call.
This could be improved further in a Read scenario by projecting to to a view model / DTO with just the columns you need from the reference. (For instance if References are a significant # of columns of various sizes but you only need a handful of columns)