SQL WHERE statement where string can differ maximum one char but otherwise be equal

89 views Asked by At

I am trying to create a SQL WHERE statement to compare strings that its allowed that maximum one character is different but still must be same length. I have tried using substring search with LIKE command but it does not work well if the char that is different in the middle of the string etc. Strings that should match examples:

should match: "ABC123" and "ABC023" "ABC124" and "ABC125" etc

should not match

"ABC123" and "ACB132"

How can this be achieved with a SQL WHERE statement?

2

There are 2 answers

0
Thom A On

Assuming you are on on the latest version of SQL Server (2022 at time of writing), you could use GENERATE_SERIES to get a row for each character in your search string, and then use STUFF to replace each character in a position with a single character wildcard (_). Then you can use an LIKE inside an EXISTS:

DECLARE @YourString varchar(6) = 'ABC123';

SELECT YT.YourColumn
FROM dbo.YourTable YT
WHERE EXISTS(SELECT 1
             FROM GENERATE_SERIES(1,LEN(@YourString)) GS
             WHERE YT.YourColumn LIKE STUFF(@YourString,GS.value,1,'_'));

db<>fiddle

If you aren't in SQL Server 2022+ then you could use your own tally function to achieve the same result.

1
Isolated On

You can create a levenshtein distance function and then call that function in your WHERE clause.

CREATE FUNCTION edit_distance(@s1 nvarchar(3999), @s2 nvarchar(3999))
RETURNS int
AS
BEGIN
 DECLARE @s1_len int, @s2_len int
 DECLARE @i int, @j int, @s1_char nchar, @c int, @c_temp int
 DECLARE @cv0 varbinary(8000), @cv1 varbinary(8000)

 SELECT
  @s1_len = LEN(@s1),
  @s2_len = LEN(@s2),
  @cv1 = 0x0000,
  @j = 1, @i = 1, @c = 0

 WHILE @j <= @s2_len
  SELECT @cv1 = @cv1 + CAST(@j AS binary(2)), @j = @j + 1

 WHILE @i <= @s1_len
 BEGIN
  SELECT
   @s1_char = SUBSTRING(@s1, @i, 1),
   @c = @i,
   @cv0 = CAST(@i AS binary(2)),
   @j = 1

  WHILE @j <= @s2_len
  BEGIN
   SET @c = @c + 1
   SET @c_temp = CAST(SUBSTRING(@cv1, @j+@j-1, 2) AS int) +
    CASE WHEN @s1_char = SUBSTRING(@s2, @j, 1) THEN 0 ELSE 1 END
   IF @c > @c_temp SET @c = @c_temp
   SET @c_temp = CAST(SUBSTRING(@cv1, @j+@j+1, 2) AS int)+1
   IF @c > @c_temp SET @c = @c_temp
   SELECT @cv0 = @cv0 + CAST(@c AS binary(2)), @j = @j + 1
 END

 SELECT @cv1 = @cv0, @i = @i + 1
 END

 RETURN @c
END
create table table1 (
  id integer, 
  string1 varchar(50), 
  string2 varchar(50)
  );
insert into table1 values 
(1, 'ABC123', 'ABC023'),
(2, 'ABC123', 'ABC12'),
(3, 'ABC124', 'ABC125'),
(4, 'ABC123', 'ABC132');
select *
  from table1
 where dbo.edit_distance(string1,string2) <= 1
   and len(string1) = len(string2)
id string1 string2
1 ABC123 ABC023
3 ABC124 ABC125

fiddle