Binary comparison of strings with SQL Server

74 views Asked by At

Is there a way to perform an accent-sensitive, case-sensitive, "padding-sensitive" (pay attention to trailing spaces) query in SQL Server without having to tweak both operands?

The following script is from db<>fiddle:

create table users (
  id   int,
  name varchar(255)
);
insert into users (id, name) 
  values 
  (1, 'bar'),
  (2, 'bar ');

Here I expect to find just record #2, it finds both: padding insensitive.

select * from users where name = 'bar ';
id name
1 bar
2 bar

It is also case insensitive.

select * from users where name = 'BAR';
id name
1 bar
2 bar

But accent sensitive.

select * from users where name = 'bàz';
id name

Using _BIN2 seems to make no difference, in spite of (my understanding of) what the documentation says (and it feels unnerving to specify "Latin 1": I do not want to restrict my to the sole ISO-8859-1 fragment, I want the full UTF-8 coverage). It should find nothing!

select * from users
where (name collate Latin1_General_BIN2) in ('bàr', 'BAR', 'bar  ');
id name
1 bar
2 bar

Even when applied on both operands.

select * from users
where (name collate Latin1_General_CS_AS) in 
  ('bàr' collate Latin1_General_CS_AS, 
  'BAR'  collate Latin1_General_CS_AS, 
  'bar  ' collate Latin1_General_CS_AS);
id name
1 bar
2 bar

Cast to binary on the left-hand-side only does not work.

select * from users where cast(name as binary) = 'bar';
id name

Casting on both ends works, but I want to avoid that (it scales poorly when trying more complex queries, possibly with tuples, etc.)

select * from users where cast(name as binary) = cast('bar' as binary);
id name
1 bar

Casting of both sides works with in too. But then again, I want to avoid that.

select * from users
where cast(name as binary) in 
  (cast('bàr'  as binary), 
  cast('BAR'   as binary), 
  cast('bar  ' as binary));
id name

So, is there any way to get an exact binary equality test by acting on the left-hand-side only? Something similar to binary name = 'bar' in MySQL.

Thanks in advance!

1

There are 1 answers

9
Thom A On

It seems that what might be best here is to add a computed column to your table that converts the value to a (var)binary. As name is a varchar(255) I'm going to use a varbinary(255) here. Then you can INDEX that column so that you can query is appropriately. You will, however, still need to CONVERT your input values. You therefore might be better off using parameters, as then you can be explicit on the data type and therefore don't end up with someone passing a varchar and causes an implicit_convert. This gives some DDL and DML like so:

CREATE TABLE dbo.users (id int CONSTRAINT PK_users PRIMARY KEY CLUSTERED,
                        name varchar(255));

INSERT INTO dbo.users (id,
                   name)
VALUES (1, 'bar'),
       (2, 'bar ');
GO

ALTER TABLE dbo.users ADD BinaryName AS CONVERT(varbinary(255),name);
GO
CREATE INDEX IX_users_BinaryName ON dbo.users (BinaryName) INCLUDE (name);
GO

Then a query for a single value might look like this:

DECLARE @name varbinary(255) = CONVERT(varbinary(255),'bàr'); --Implicit conversion isn't allowed

SELECT u.id,
       u.name
FROM dbo.users u
WHERE BinaryName = @name;

If you need to multiple values, you might be better off with a table type variable/parameter and something like this:

DECLARE @Names table (name varbinary(255));

INSERT INTO @Names (name)
SELECT CONVERT(varbinary(255),v.name)
FROM (VALUES('bàr'),('BAR'),('bar  '))V(name);

SELECT u.id,
       u.name
FROM dbo.users u
WHERE EXISTS (SELECT 1
              FROM @Names n
              WHERE n.name = u.BinaryName);