how to remove all html characters in snowflake, dont want to include all html special characters in query (no hardcoding)

3.6k views Asked by At

Want to remove below kind of characters from string..pl help ' &

2

There are 2 answers

2
Gokhan Atil On BEST ANSWER

You may try this one to remove any HTML special characters:

select REGEXP_REPLACE( 'abc&def³»ghi', '&[^&]+;', '!'  );

Explanation:

REGEXP_REPLACE uses regular expression to search and replace. I search for "&[^&]+;" and replace it with "!" for demonstration. You can of course use '' to remove them. More info about the function:

https://docs.snowflake.com/en/sql-reference/functions/regexp_replace.html

About the regular expression string:

  • & is the & character of a HTML special character
  • [^&] means any character except &. Tthis prevents to REGEXP to replace all characters between the first '&' char and last ';'. It will stop when it see second '&'
  • + means match 1 or more of preceding token (any character except &)
  • ; is the last character of a HTML special character
0
Saurabh Dixit On
CREATE or REPLACE FUNCTION UDF_StripHTML(str varchar)
  returns varchar
  language javascript
  strict
  as 
  'var HTMLParsedText=""
   var resultSet =  STR.split(''>'')
   var resultSetLength =resultSet.length
   var counter=0
   while(resultSetLength>0)
   {
      if(resultSet[counter].indexOf(''<'')>0)
      {    
        var value = resultSet[counter]
        value=value.substring(0, resultSet[counter].indexOf(''<''))
        if (resultSet[counter].indexOf(''&'')>=0 && resultSet[counter].indexOf('';'')>=0) 
        {
            value=value.replace(value.substring(resultSet[counter].indexOf(''&''), resultSet[counter].indexOf('';'')+1),'''')
        }
      }
        if (value)
        {
          value = value.trim();
          if(HTMLParsedText === "")
          {
              HTMLParsedText = value
          }
          else
          {
            if (value) {
              HTMLParsedText = HTMLParsedText + '' '' + value
            }
          }
          value=''''
        }
        counter= counter+1
      resultSetLength=resultSetLength-1
   }
    return HTMLParsedText';
  
to call this UDF :

Select UDF_StripHTML(text)