The right way to check if a string has hebrew chars

4.7k views Asked by At

The Hebrew language has unicode representation between 1424 and 1514 (or hex 0590 to 05EA).

I'm looking for the right, most efficient and most pythonic way to achieve this.

First I came up with this:

for c in s:
    if ord(c) >= 1424 and ord(c) <= 1514:
        return True
return False

Then I came with a more elegent implementation:

return any(map(lambda c: (ord(c) >= 1424 and ord(c) <= 1514), s))

And maybe:

return any([(ord(c) >= 1424 and ord(c) <= 1514) for c in s])

Which of these are the best? Or i should do it differently?

3

There are 3 answers

0
MRAB On BEST ANSWER

You could do:

# Python 3.
return any("\u0590" <= c <= "\u05EA" for c in s)
# Python 2.
return any(u"\u0590" <= c <= u"\u05EA" for c in s)
10
Marcin On

Your basic options are:

  1. Match against a regex containing the range of characters; or
  2. Iterate over the string, testing for membership of the character in a string or set containing all of your target characters, and break if you find a match.

Only actual testing can show which is going to be faster.

0
yekta On

Its simple to check the first character with unidcodedata:

import unicodedata

def is_greek(term):
    return 'GREEK' in unicodedata.name(term.strip()[0])


def is_hebrew(term):
    return 'HEBREW' in unicodedata.name(term.strip()[0])