I'm using Ruby 2.4. I want to match a bunch of non-letter and numbers, followed by one or more numbers, followed by an arbitrary amount of non-letters and numbers. However, this string
2.4.0 :001 > token = "17 Milton,GA"
=> "17 Milton,GA"
...
2.4.0 :004 > Regexp.new("\\A([[:space:]]|[^\p{L}^0-9])*\\d+[^\p{L}^0-9]*\\z").match?(token.downcase)
=> true
is matching my regular expression and I dont' want it to since there are letters that follow the number. What do I need to adjust in my regexp so that the only thing I can match after the numbers will be non-letters and non-numbers?
There are a couple of issues with the regex.
1) When you are using a double quoted string literal in a
Regexp.new
constructor, to declare a literal backslash you need to double it (\p
=>\\p
)2)
[^\p{L}^0-9]
is is a wrong construct for any char but a letter and digit because the second^
is treated as a literal^
symbol. You need to remove the second^
at least. You may also use[^[:alnum:]]
to match any non-alphanumeric symbol.3) The pattern above matches whitespaces, too, so you do not need to alternate it with
[[:space]]
.([[:space:]]|[^\p{L}^0-9])*
->[^\p{L}0-9]*
.So, you may use your fixed
Regexp.new("\\A[^\\p{L}0-9]*\\d+[^\\p{L}0-9]*\\z")
regexp, or useSee the Rubular demo where your sample string is not matched with the regex.
Details:
\A
- start of a string[^[:alnum:]]*
- 0+ non-alphanumeric chars\d+
- 1+ digits[^[:alnum:]]*
- 0+ non-alphanumeric chars\z
- end of string.