How to remove specific characters from select column in Snowflake

172 views Asked by At

I have loaded data into table using copy into from stage file(CSV).

+--------------------------------------------------------------------+--------+
| STRING                                                             | RESULT |
+--------------------------------------------------------------------+--------+
| hhhhrerererereNRD\r\n\r\nthe 193how (test) testtest peo\r\n\r\n    |  30.00 |
+--------------------------------------------------------------------+--------+

my select query expected output is like below:

+---------------------------------------------------------+--------+
| STRING                                                  | RESULT |
+---------------------------------------------------------+--------+
| hhhhrerererereNRDthe 193how (test) testtest peo         |  30.00 |
+---------------------------------------------------------+--------+

I have tried following option like :

SELECT
    regexp_replace(STRING,'\r\n\r\n','')    
FROM  test; 

It is not producing expected output.

3

There are 3 answers

0
Simeon Pilgrim On BEST ANSWER

step one, reproduce "the problem"

select * from values
('abc\r\n\r\nthe 193how (test) testtest peo\r\n\r\n'),
('abc\\r\\n\\r\\nthe 193how (test) testtest peo\\r\\n\\r\\n');

gives:

enter image description here

which shows that \r and \n in the input is parsed by the SQL parser to mean newline and carriage return, thus we need to escape them to get them into the output:

what is worth noting is that in the output panel we can see the first is actually have new lines shown to be present and effecting stuff:

enter image description here

so now we can do the replacements we want

we can see using the replace you have, does have impact:

select *
    ,replace($1, '\n\r', '')
from values
('abc\r\n\r\nthe 193how (test) testtest peo\r\n\r\n'),
('abc\\r\\n\\r\\nthe 193how (test) testtest peo\\r\\n\\r\\n');

enter image description here

so if you output looks like the second row, then you don't have a cariage return but \ & \r in your string. Thus to match that we need to match \\r or \\n BUT regexp also has a parser, so the input needs to be \\\\r to match \ & \r

select *
    ,regexp_replace($1, '\\\\n', '') as no_n
    ,regexp_replace($1, '\\\\r', '') as no_r
from values
('abc\r\n\r\nthe 193how (test) testtest peo\r\n\r\n'),
('abc\\r\\n\\r\\nthe 193how (test) testtest peo\\r\\n\\r\\n');

enter image description here

now in the lower row we see \n or \r removed, now to match ether, we need to put an Grouping and OR in the regexp:

select *
    ,regexp_replace($1, '(\\\\n|\\\\r)', '') as no_n_or_r
from values
('abc\r\n\r\nthe 193how (test) testtest peo\r\n\r\n'),
('abc\\r\\n\\r\\nthe 193how (test) testtest peo\\r\\n\\r\\n');

enter image description here

so if you really want to remove embedded newlines and carriage returns, AND escaped, you will want more OR clauses:

select *
    ,regexp_replace($1, '(\\\\n|\\\\r|\\n|\\r)', '') as no_n_or_r_or_embeds
from values
('abc\r\n\r\nthe 193how (test) testtest peo\r\n\r\n'),
('abc\\r\\n\\r\\nthe 193how (test) testtest peo\\r\\n\\r\\n');

enter image description here

and now we can see both have correctly be stripped out.

0
akshindesnowflake On
SET VAL='hhhhrerererereNRD\\r\\n\\r\\nthe 193how (test) testtest peo\\r\\n\\r\\n';
SELECT $VAL;
SELECT REGEXP_REPLACE($VAL,'\\\\r\\\\n\\\\r\\\\n','');

\r\n are special characters in string , \r is carriage return and \n is newline character; in first line those are escaped by one extra slash before this special character in order to recognize those are as characters in regular string. In Regular expression for replace you have to use 2 double slash on is order to recognize that as characters.

0
Aishwarya On

Given suggestion try using the replace function and char code in query

Query :

SELECT
    REPLACE(REPLACE(REPLACE(STRING, CHAR(10), ''), CHAR(13), ''), CHAR(9), '')
FROM  test; 

purpose of ASCII CHR:

  • CHR(10) -> Line feed
  • CHR(13) -> Carriage Return
  • CHR(9) -> Horizontal Tab