Ruby: how to check if an UTF-8 string contains only letters and numbers?

3.5k views Asked by At

I have an UTF-8 string, which might be in any language.

How do I check, if it does not contain any non-alphanumeric characters?

I could not find such method in UnicodeUtils Ruby gem.

Examples:

  1. ėččę91 - valid
  2. $120D - invalid
3

There are 3 answers

1
the Tin Man On BEST ANSWER

You can use the POSIX notation for alpha-numerics:

#!/usr/bin/env ruby -w
# encoding: UTF-8

puts RUBY_VERSION

valid = "ėččę91"
invalid = "$120D"

puts valid[/[[:alnum:]]+/]
puts invalid[/[^[:alnum:]]+/]

Which outputs:

1.9.2
ėččę91
$
3
Michael Papile On

In ruby regex \p{L} means any letter (in any glyph)

so if s represents your string:

 s.match /^[\p{L}\p{N}]+$/

This will filter out non numbers and letters.

0
tchrist On

The pattern for one alphanumeric code point is

/[\p{Alphabetic}\p{Number}]/

From there it’s easy to extrapolate something like this for has a negative:

/[^\p{Alphabetic}\p{Number}]/

or this for is all positive:

 /^[\p{Alphabetic}\p{Number}]+$/

or sometimes this, depending:

/\A[\p{Alphabetic}\p{Number}]+\z/

Pick the one that best suits your needs.