I have a large table (5 million rows), with a unique identifier column called 'unique_id'
I'm running the INSERT query through Node.js (node-mysql
bindings) and there's a chance that duplicates could be attempted to be inserted.
The two solutions are:
1) Make 'unique_id' an index, and check the entire database for a duplicate record, prior to INSERT:
'SELECT unique_id WHERE example = "'+unique_id+'" LIMIT 1'
2) Make 'unique_id' a unique index within MySQL, and perform the INSERT without checking for duplicates. Clearly, any duplicates would cause error and not be inserted into the table.
My hunch is that solution 2) is better, as it prevents a search of worse-case (5 million - 1) rows for a duplicate.
Are there any downsides to using solution 2)?
There is a number of advantages to defining a unique, primary index for the unique_id column:
With the 2nd solution you might need to handle the attempt of inserting a duplicate (unless your unique ids are generated by MySQL).
Autoincremented primary index: https://dev.mysql.com/doc/refman/5.7/en/example-auto-increment.html