Regex like telephone number on Hive without prefix (+01)

525 views Asked by At

We have a problem with a regular expression on hive. We need to exclude the numbers with +37 or 0037 at the beginning of the record (it could be a false result on the regex like) and without letters or space.

We're trying with this one: regexp_like(tel_number,'^\+37|^0037+[a-zA-ZÀÈÌÒÙ ]') but it doesn't work.

Edit: we want it to come out from the select as true (correct number) or false.

3

There are 3 answers

3
leftjoin On BEST ANSWER

To exclude numbers which start with +01 0r +001 or +0001 and having only digits without spaces or letters:

... WHERE tel_number NOT rlike '^\\+0{1,3}1\\d+$'

Special characters like + and character classes like \d in Hive should be escaped using double-slash: \\+ and \\d.

0
Manuel Batsching On

The general question is, if you want to describe a malformed telephone number in your regex and exclude everything that matches the pattern or if you want to describe a well-formed telephone number and include everything that matches the pattern.

Which way to go, depends on your scenario. From what I understand of your requirements, adding "not starting with 0037 or +37" as a condition to a well-formed telephone number could be a good approach.

The pattern would be like this:

  1. Your number can start with either + or 00: ^(\+|00)

  2. It cannot be followed by a 37 which in regex can be expressed by the following set of alternatives:

    a. It is followed first by a 3 then by anything but 7: 3[0-689]

    b. It is followed first by anything but 3 then by any number: [0-24-9]\d

  3. After that there is a sequence of numbers of undefined length (at least one) until the end of the string: \d+$

Putting everything together:

^(\+|00)(3[0-689]|[0-24-9]\d)\d+$

You can play with this regex here and see if this fits your needs: https://regex101.com/r/KK5rjE/3

Note: as leftjoin has pointed out: To use this regex in hive you might need to additionally escape the backslashes \ in the pattern.

0
Wiktor Stribiżew On

You can use

regexp_like(tel_number,'^(?!\\+37|0037)\\+?\\d+$')

See the regex demo. Details:

  • ^ - start of string
  • (?!\+37|0037) - a negative lookahead that fails the match if there is +37 or 0037 immediately to the right of the current location
  • \+? - an optional + sign
  • \d+ - one or more digits
  • $ - end of string.