URL Decode (for Extended Ascii) using TSQL

1.1k views Asked by At

I am needing some help fine tuning a T-SQL function to properly DECODE a string which contains URL. Just the query string parameter values are URLEcoded (and not the whole URL). The original function works well when decoding single byte characters however it doesn't handle multiple byte characters. In order to address the DECODING of multiple byte characters such as Spanish Accented characters; my plan was to find the values using PATINDEX and replace those using a lookup table (this is because we are dealing with a small finite number of specials characters which fall in this category).

Problem: The Pattern specified below is not returning any matches therefore I am pretty much stuck at this point.

Example: Pattern '%[%][0-9a-f][0-9a-f]%' works for single byte encoded chars. Similarly Pattern '%[%][0-9a-f][0-9a-f][%][0-9a-f][0-9a-f][%][0-9a-f][0-9a-f][%][0-9a-f][0-9a-f]%' should work for double byte chars such as (%C3%A9 -> ) but it does not.

Here is my code:

 DECLARE @Position INT,
    @Base CHAR(16),
    @High TINYINT,
    @Low TINYINT,
    @Pattern VARCHAR(256),
    @URL VARCHAR(8000)

SET @Url = '%26Text1%3DFrom%20Ren%C3%A9%27s'

SELECT  @Base = '0123456789abcdef',
    @Pattern = '%[%][0-9a-f][0-9a-f][%][0-9a-f][0-9a-f]%',
    --@URL = REPLACE(@URL, '+', ' '),
    @Position = PATINDEX(@Pattern, @URL)

PRINT 'Position: ' + + CAST(@Position AS Varchar(256))

WHILE @Position > 0
    BEGIN
    SELECT  
        @High = CHARINDEX(SUBSTRING(@URL, @Position + 1, 1), @Base COLLATE Latin1_General_CI_AS),
        @Low = CHARINDEX(SUBSTRING(@URL, @Position + 2, 1), @Base COLLATE Latin1_General_CI_AS),
        @URL = STUFF(@URL, @Position, 6, '123456'),
        @Position = PATINDEX(@Pattern, @URL)

    PRINT 'High: ' + CAST(@High AS Varchar(256))

    PRINT @URL
END 
2

There are 2 answers

2
paparazzo On

@Pattern CHAR(21) is truncating

see

set nocount on
 DECLARE @Position INT,
    @Base CHAR(16),
    @High TINYINT,
    @Low TINYINT,
    @Pattern VARCHAR(200),
    @URL VARCHAR(8000)

SET @Url = '%26Text1%3DFrom%20Ren%C3%A9%27s'

SELECT  @Base = '0123456789abcdef',
    @Pattern = '%[%][0-9a-f][0-9a-f][%][0-9a-f][0-9a-f]%',
    --@URL = REPLACE(@URL, '+', ' '),
    @Position = PATINDEX(@Pattern, @URL)

select @URL
select @Pattern    
select @position
0
Robert On

I had a syntax error within the pattern. After reading the documentation closely I realized that I needed to escape % sign using an additional percent sign. Here is the working solution (the sub query to replace the value is not working but the pattern is):

    DECLARE @Position INT,
    @Base CHAR(16),
    @High TINYINT,
    @Low TINYINT,
    @Pattern nVARCHAR(256),
    @ToReplace nVARCHAR(256),
    @ReplaceWith nVARCHAR(256),
    @URL nVARCHAR(4000)

SET @Url = '%26Text1%3DFrom%20Ren%C3%A9%27s%C3'

SELECT  @Base = '0123456789abcdef',
    @Pattern = '%[%%][c-f][0-9]%%[0-9a-f]%',
    --@URL = REPLACE(@URL, '+', ' '),
    @Position = PATINDEX(@Pattern, @URL)

  PRINT 'Position: ' + + CAST(@Position AS Varchar(256))

WHILE @Position > 0
  BEGIN
  SELECT  
        @High = CHARINDEX(SUBSTRING(@URL, @Position + 1, 1), @Base COLLATE Latin1_General_CI_AS),
        @Low = CHARINDEX(SUBSTRING(@URL, @Position + 2, 1), @Base COLLATE Latin1_General_CI_AS),
        @ToReplace = SUBSTRING(@URL, @Position, 6),
        @ReplaceWith = (SELECT COALESCE([Text], 'Something') FROM dbo.ExtendedAsciiLookup WHERE UTF = @ToReplace),
        @URL = STUFF(@URL, @Position, 6, @ReplaceWith),
        @Position = PATINDEX(@Pattern, @URL)

        PRINT 'High: ' + CAST(@High AS Varchar(256))
        PRINT '@ToReplace: ' + CAST(COALESCE(@ToReplace,'') AS nVARCHAR(256))
        PRINT 'With: ' + CAST(COALESCE(@ReplaceWith,'') AS VARCHAR(256))

        PRINT @URL
END