How does one match alphabets of different languages in PostgreSQL?

63 views Asked by At

How to find sequence of Alphabets of non-ASCII (other languages) in a given string in PostgreSQL? For example, ASCII alphabets can be matched using '[A-Za-z]'.

In SQL Server, @ch BETWEEN 'A' and 'Z' matches the characters like Ñ, ü, Ä, etc.

3

There are 3 answers

5
Laurenz Albe On BEST ANSWER

That depends on the collation you are using. With most natural language collations, the comparison would work:

SELECT 'ñ' COLLATE "en-US-x-icu" BETWEEN 'A' AND 'Z';

 ?column? 
══════════
 t
(1 row)

The easiest way to check if a string contains only alphabetic characters is a regular expression:

SELECT NOT '中文µxY' COLLATE "de_AT.utf8" ~ '[^[:alpha:]]';

 ?column? 
══════════
 t
(1 row)

SELECT NOT 'a+b' COLLATE "de_AT.utf8" ~ '[^[:alpha:]]';

 ?column? 
══════════
 f
(1 row)
1
Maimoona Abid On

In PostgreSQL Regular expressions and character classes can be used to match non-ASCII alphabets in a given string. Try running the following query to extract rows from your table where one or more characters from the non-ASCII alphabet are present in the 'your_column'. For your particular case, change the table and column names as necessary.

SELECT *
FROM your_table
WHERE your_column ~ '\p{L}';

Hope it's helpful :)

1
Namsi Lydia On

In Postgresql one can use regular expression to find the sequence of non -ASCII alphabets.The [^[:ascii:]]pattern matches any character that is not in the ASCII character set and it can be done as follows:

SELECT * FROM your_table WHERE your_column ~ '[^[:ascii:]]';