I have a problem of inserting huge amount of data to SQL server.
Previously I was using Entity framework, but it was damn slow for just 100K root level records ( containing separately two distinct collections, where each one is further operating on 200K records roughly ) = roughly 500K-600K records in memory. Here I applied all optimization ( e.g AutoDetectChangesEnabled = false, and recreated and disposed the context after each batch. )
I rejected the solution, and used BulkInsert that's substantially very fast and much efficient. Was just able to insert 100K records in a minute or so.
But the main problem is getting back primary keys from newly inserted records. For this , I am thinking to write stored procedure which could operate on TVP ( i.e in memory data table holding all root level 100K records ). and there inside I would use OUTPUT INSERTED.Id in order to get all primary keys inside application).
So, how can I compare this approach ( i.e Sql Insert query inside stored procedure ) with SqlBulkCopy approach.
Any idea if somehow, I can get all primary keys back after SqlBulkCopy operation? Or something concrete regarding OUTPUT Inserted.Id would return all correct new keys in application.
PS : I don't want to create any staging table during the process. This is just an overhead.
Here's an example based on discussion in the comments / expanding on the idea mentioned here: Possible to get PrimayKey IDs back after a SQL BulkCopy?
i.e.
I've not had a chance to test this, but hopefully this will help:
Whilst in your question you've added that you don't want to use a staging table as it's an "overhead"... please try. You may find that the small overhead of creating a staging table is less than the performance gain in using this method.
Obviously it's not going to be as fast as inserting and ignoring the returned ids; but if that's your requirement, in the absence of other answers, this may be the best alternative.