t-sql string unique ID (Northwind database)

1.5k views Asked by At

I've been trying to get this right for some time now with no use.

I have a table in mssql database and I want to insert new row using stored procedure

CREATE TABLE "Customers" (
"CustomerID" NCHAR(5) NOT NULL,
"CompanyName" NVARCHAR(40) NOT NULL,
"ContactName" NVARCHAR(30) NULL,
"ContactTitle" NVARCHAR(30) NULL,
"Address" NVARCHAR(60) NULL,
"City" NVARCHAR(15) NULL,
"Region" NVARCHAR(15) NULL,
"PostalCode" NVARCHAR(10) NULL,
"Country" NVARCHAR(15) NULL,
"Phone" NVARCHAR(24) NULL,
"Fax" NVARCHAR(24) NULL,
PRIMARY KEY ("CustomerID")
);

The problem is CustomerID field which contains unique string for each record (ALFKI, BERGS, BERGS, etc.)

I want to make a stored procedure which will insert a row with new data and create an unique CustomerID. Build in functions are out of a question as I need the string to be 5 chars long.

I have a procedure which generates 5 chars ID as follows

begin

declare @chars char(26) = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
declare @i int = 0
declare @id varchar(max) = ''

while @i < 5
begin
        set @id = @id + substring(@chars, cast(ceiling(rand() * 26) as int), 1)

    set @i = @i + 1
end

Select (cast(@id as nvarchar(400)))

end

And the one that I tried to make work with no use. It is supposed to select an unique id (set @id = 'ANATR' is there on purpose to make it go into the loop

begin
declare @randID varchar(5) = ''
declare @selectID varchar(20) = ''
declare @chars char(26) = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
declare @i int = 0
declare @id varchar(10) = ''

while @i < 5
begin
   set @id = @id + substring(@chars, cast(ceiling(rand() * 26) as int), 1)           
    set @i = @i + 1
end
select @id
set @id = 'ANATR'

SET @selectID = (SELECT CustomerID FROM CUSTOMERS WHERE CustomerID = @id)


while @selectID <> 'NULL'
begin
    set @id = ''
    while @i < 5
        begin
            set @id = @id + substring(@chars, cast(ceiling(rand() * 26) as int), 1)           
            set @i = @i + 1
        end

    SET @selectID = (SELECT CustomerID FROM CUSTOMERS WHERE CustomerID = @id)   

    SELECT @id
end


end

Here is the insert procedure I have at the moment

CREATE PROCEDURE [dbo].[InsertCustomers]

(

@CustomerID nchar(5),

@CompanyName nvarchar(40),

@ContactName nvarchar(30) = NULL,

@ContactTitle nvarchar(30) = NULL,

@Address nvarchar(60) = NULL,

@City nvarchar(15) = NULL,

@Region nvarchar(15) = NULL,

@PostalCode nvarchar(10) = NULL,

@Country nvarchar(15) = NULL,

@Phone nvarchar(24) = NULL,

@Fax nvarchar(24) = NULL

)

AS

SET NOCOUNT OFF;

 INSERT INTO [dbo].[Customers] ([CustomerID], [CompanyName], [ContactName], [ContactTitle], [Address], [City], [Region], [PostalCode], [Country], [Phone], [Fax]) VALUES (@CustomerID, @CompanyName, @ContactName, @ContactTitle, @Address, @City, @Region, @PostalCode, @Country, @Phone, @Fax);
3

There are 3 answers

4
M.Ali On BEST ANSWER

I believe you can do something like this to make sure you all get a unique id

begin

declare @chars char(26) = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
declare @i int = 0
declare @id varchar(max) = ''


while (1=1)
begin
        set @id = @id + substring(@chars, cast(ceiling(rand() * 26) as int), 1)

    set @i = @i + 1

   IF (NOT EXISTS(SELECT * FROM Customers WHERE CustomerID = @id) AND LEN(@id) = 5)
      BREAK
   ELSE
      CONTINUE
end

Select (cast(@id as nvarchar(400)))

end

Set the while condition to be always true and break out of while loop only when both of your requirements are TRUE i.e Length of new ID is 5 and it does not exist in the customers table already.

2
Aaron Bertrand On

The main problem here is that the incremental cost of detecting collisions from the generated string, and try again, increases as you generate more and more strings (since you have to read all of those strings to make sure you didn't generate a duplicate). At the same time, the odds of hitting a duplicate goes up, meaning the bigger the table gets, the slower this process will get.

Why do you need to generate the unique string at runtime? Build them all in advance. This article and this post are about random numbers, but the basic concept is the same. You build up a set of unique strings and pull one off the stack when you need one. Your chance of collisions stays constant at 0% throughout the lifetime of the application (provided you build up a stack of enough unique values). Pay for the cost of collisions up front, in your own setup, instead of incrementally over time (and at the cost of a user waiting for those attempts to finally yield a unique number).

This will generate 100,000 unique 5-character strings, at the low, one-time cost of about 1 second (on my machine):

;WITH 
 a(a) AS 
 (
   SELECT TOP (26) number + 65 FROM master..spt_values 
   WHERE type = N'P' ORDER BY number
 ),
 b(a) AS 
 (
   SELECT TOP (10) a FROM a ORDER BY NEWID()
 )
SELECT DISTINCT CHAR(b.a) + CHAR(c.a) + CHAR(d.a) + CHAR(e.a) + CHAR(f.a)
FROM b, b AS c, b AS d, b AS e, b AS f;

That's not enough? You can generate about 1.12 million unique values by changing TOP (10) to TOP (20). This took 18 seconds. Still not enough? TOP (24) will give you just under 8 million in about 2 minutes. It will get exponentially more expensive as you generate more strings, because that DISTINCT has to do the same duplicate checking you want to do every single time you add a customer.

So, create a table:

CREATE TABLE dbo.StringStack
(
  ID INT IDENTITY(1,1) PRIMARY KEY,
  String CHAR(5) NOT NULL UNIQUE
);

Insert that set:

;WITH 
 a(a) AS 
 (
   SELECT TOP (26) number + 65 FROM master..spt_values 
   WHERE type = N'P' ORDER BY number
 ),
 b(a) AS 
 (
   SELECT TOP (10) a FROM a ORDER BY NEWID()
 )
INSERT dbo.StringStack(String)
SELECT DISTINCT CHAR(b.a) + CHAR(c.a) + CHAR(d.a) + CHAR(e.a) + CHAR(f.a)
FROM b, b AS c, b AS d, b AS e, b AS f;

And then just create a procedure that pops one off the stack when you need it:

CREATE PROCEDURE dbo.AddCustomer
  @CustomerName VARCHAR(64) /* , other params */
AS
BEGIN
  SET NOCOUNT ON;
  
  DELETE TOP (1) dbo.StringStack
    OUTPUT deleted.String, @CustomerName /* , other params */
    INTO dbo.Customers(CustomerID, CustomerName /*, ...other columns... */);
END
GO

No silly looping, no needing to check if the CustomerID you generated just exists, etc. The only additional thing you'll want to build is some type of check that notifies you when you're getting low.

As an aside, these are terrible identifiers for a CustomerID. What is wrong with a sequential surrogate key, like an IDENTITY column? How is a 5-digit random string with all this effort involved, any better than a unique number the system can generate for you much more easily?

3
Nevoris On

Muhammed Ali 's answer works, but will prove rather ressource intensive (especially when there aren't many combinations of 5 letters left to use) : your function uses the random generator, and it will take it a while to find a combination that isn't used, especially since it has a very limited memory of its previous results. This means it will try, and might give you something of the sort (exaggerating a bit) : BAGER the first time, then ANSWE the second time, then again BAGER the third time. You see you will lose a good amount of time with the generator giving you the same answer over and over again (especially over 12M possible combinations).

If you are looking for a fixed length ID (since you use NCHAR(5), I guess that's a good assumption), I would rather look into building a table that contains all the possible combinations, and pick one value of this table every time you need one. You would delete it once it got used, or mark it as used (which I would prefer, for reuseability reasons).

This leads to my final comment (which I cannot put as comment 'cause I don't have enough reputation) : why not use the IDENTITY function provided by MS-SQL ? This provides a much better handling of the Primary key generation...