How to check the records having apostrophes ( ' ) in databricks Sql

62 views Asked by At

Need to find the records having special characters. the allowed characters are Hyphen, comma, forward slash(/), Hash and Dot and apostrophes (').


In the AZURE Databricks Table having few data like  

L'ÎLE-PERROT -- Invalid
CHÉNÉVILLE -- Invalid
1205-RUE DE L'ACADIE - Valid
'4th Floor - InterContinental - Valid
VAL D'OR -- Valid
O'LEARY -- Valid
Bucure¿ti -- Invalid
LA DORÉ - Invalid

I tried using three methods. by using all these methods the values with apostrophes (') is skipped for validation.

  1. By using regexp

    (REGEXP(TRIM(column1 ), '[^a-zA-Z0-9,-\-/#''\s]')
    
  2. By using rlike

    (column1 rlike '[a-zA-Z0-9\\-,/#''\\s]')
    
  3. By using Translate

    (TRANSLATE(column1, 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789,/#. -') = '')
    
1

There are 1 answers

0
s.polam On

Check below code.

scala> df
.withColumn(
    "new_data", regexp_replace($"data", "[^a-zA-Z0-9,-/#. ']+", "")
)
.withColumn(
    "output",
     when(
        $"data" === $"new_data", lit("valid"))
    .otherwise("invalid")
)
.show(false)

OR

SELECT 
    data,
    REGEXP_REPLACE(data, "[^a-zA-Z0-9,-/#. ']+", "") as new_data,
    (  
        CASE WHEN data = REGEXP_REPLACE(data, "[^a-zA-Z0-9,-/#. ']+", "") 
          THEN 'valid' ELSE 'invalid' 
        END
    ) AS output
FROM input
+-----------------------------+-----------------------------+-------+
|data                         |new_data                     |output |
+-----------------------------+-----------------------------+-------+
|L'ÎLE-PERROT                 |L'LE-PERROT                  |invalid|
|CHÉNÉVILLE                   |CHNVILLE                     |invalid|
|1205-RUE DE L'ACADIE         |1205-RUE DE L'ACADIE         |valid  |
|'4th Floor - InterContinental|'4th Floor - InterContinental|valid  |
|VAL D'OR                     |VAL D'OR                     |valid  |
|O'LEARY                      |O'LEARY                      |valid  |
|Bucure¿ti                    |Bucureti                     |invalid|
|LA DORÉ                      |LA DOR                       |invalid|
+-----------------------------+-----------------------------+-------+