I have difficulties trying to understand the behavior of PATINDEX when matching the caret symbol "^". I am both using it in it's exclusionary function, and matching it itself. I am trying different combinations of varchar and nvarchar for match pattern and input. The results are not what I expect. I could not find helpful info online.
Sample code, run on SQL Server 2019, collation is SQL_Latin1_General_CP1_CI_AS:
-- Using patindex to match carets and non-carets confuses me.
-- I expect no matches, all results should be zero,
-- but only the first field returns a zero, the others get ones/find matches.
SELECT
'match_^__v_v' = PATINDEX( '^[^^]%', '^^Z' COLLATE Latin1_GENERAL_CI_AS)
,'match_^__v_nv' = PATINDEX( '^[^^]%', N'^^Z' COLLATE Latin1_GENERAL_CI_AS)
,'match_^__nv_nv' = PATINDEX( N'^[^^]%', N'^^Z' COLLATE Latin1_GENERAL_CI_AS)
,'match_^__nv_v' = PATINDEX( N'^[^^]%', '^^Z' COLLATE Latin1_GENERAL_CI_AS)
-- Same issues for this attempt with additional square brackets,
-- with the same unexpected results as above:
SELECT
'match_[^]__v_v' = PATINDEX( '[^][^^]%', '^^Z' COLLATE Latin1_GENERAL_CI_AS)
,'match_[^]__v_nv' = PATINDEX( '[^][^^]%', N'^^Z' COLLATE Latin1_GENERAL_CI_AS)
,'match_[^]__nv_nv' = PATINDEX( N'[^][^^]%', N'^^Z' COLLATE Latin1_GENERAL_CI_AS)
,'match_[^]__nv_v' = PATINDEX( N'[^][^^]%', '^^Z' COLLATE Latin1_GENERAL_CI_AS)
-- Similar operation for "normal" letter shows none of the issues.
-- All fields return zero, as expected.
SELECT
'match_a__v_v' = PATINDEX( 'a[^a]%', 'aaZ' COLLATE Latin1_GENERAL_CI_AS)
,'match_a__v_nv' = PATINDEX( 'a[^a]%', N'aaZ' COLLATE Latin1_GENERAL_CI_AS)
,'match_a__nv_nv' = PATINDEX( N'a[^a]%', N'aaZ' COLLATE Latin1_GENERAL_CI_AS)
,'match_a__nv_v' = PATINDEX( N'a[^a]%', 'aaZ' COLLATE Latin1_GENERAL_CI_AS)
Questions:
- There should be no match, all fields should get zero. Why are the results not as expected?
- Why is PATINDEX's behavior different for varchar and nvarchar?
- Where exactly should I use varchar and/or nvarchar? Should I simply make match pattern and input the same type?
Clearly this has to do with the special function of the caret symbol. What am I missing?
I added COLLATE Latin1_GENERAL_CI_AS, after learning about the significance of the relationship of collation and data type, and after the provided sql fiddle in the comments. Thanks @Martin-Smith and @Thom-A! This makes all formulas with carets fail the same way.
I found pointers to a workaround here. This gets a zero, as expected. It uses a binary collation for exact matches by ASCII code and a range within the square brackets. The range can be defined with the same character as first and last. The meaning of the square brackets is [not the characters between ^ and ^].
SELECT "match_nv_nv" = PATINDEX( N'^[^^-^]%', N'^^Z' COLLATE Latin1_General_BIN2)