How to check if a string contains accented Latin characters like é in Ruby?

Question

How to check if a string contains accented Latin characters like é in Ruby?

2.4k views Asked by sbs At 25 June 2015 at 22:25

Given:

str1 = "é"   # Latin accent
str2 = "囧"  # Chinese character
str3 = "ジ"  # Japanese character
str4 = "e"   # English character

How to differentiate str1 (Latin accent characters) from rest of the strings?

Update:

Given

str1 = "\xE9" # Latin accent é actually stored as \xE9 reading from a file

How would the answer be different?

Original Q&A

There are 3 answers

codevolution On 25 June 2015 at 22:53

Try to use /\p{Latin}/.match(strX) or /\p{Latin}&&[^a-zA-Z]/ (if you want to detect only special Latin characters).

By the way, "e" (str4) is also a Latin character.

Hope it helps.

Wally Altman On 25 June 2015 at 23:27

I'd use a two-stage approach:

Rule out strings containing non-Latin characters by attempting to encode the string as Latin-1 (ISO-8859-1).
Test for accented characters with a regular expression.

Example:

def is_accented_latin?(test_string)
  test_string.encode("ISO-8859-1")   # just to see if it raises an exception

  test_string.match(/[ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöùúûüýþÿ]/)
rescue Encoding::UndefinedConversionError
  false
end

I strongly suggest you select for yourself the accented characters you're attempting to screen for, rather than just copying what I've written; I certainly may have missed some. Also note that this will always return false for strings containing non-Latin characters, even if the string also contains a Latin character with an accent.

**Matt Brictson** · Accepted Answer · 2015-06-26T02:22:04+00:00

I would first strip out all plain ASCII characters with gsub, and then check with a regex to see if any Latin characters remain. This should detect the accented latin characters.

def latin_accented?(str)
  str.gsub(/\p{Ascii}/, "") =~ /\p{Latin}/
end

latin_accented?("é")  #=> 0 (truthy)
latin_accented?("囧") #=> nil (falsy)
latin_accented?("ジ") #=> nil (falsy)
latin_accented?("e")  #=> nil (falsy)

TechQA.

How to check if a string contains accented Latin characters like é in Ruby?

There are 3 answers

Related Questions in RUBY

Related Questions in STRING

Related Questions in NON-ASCII-CHARACTERS

Popular Questions

Trending Questions