How do I match non-letters and non-numbers after a bunch of numbers?

1k views Asked by At

I'm using Ruby 2.4. I want to match a bunch of non-letter and numbers, followed by one or more numbers, followed by an arbitrary amount of non-letters and numbers. However, this string

2.4.0 :001 > token = "17 Milton,GA"
 => "17 Milton,GA"
...
2.4.0 :004 > Regexp.new("\\A([[:space:]]|[^\p{L}^0-9])*\\d+[^\p{L}^0-9]*\\z").match?(token.downcase)
 => true

is matching my regular expression and I dont' want it to since there are letters that follow the number. What do I need to adjust in my regexp so that the only thing I can match after the numbers will be non-letters and non-numbers?

2

There are 2 answers

3
Wiktor Stribiżew On BEST ANSWER

There are a couple of issues with the regex.

1) When you are using a double quoted string literal in a Regexp.new constructor, to declare a literal backslash you need to double it (\p => \\p)

2) [^\p{L}^0-9] is is a wrong construct for any char but a letter and digit because the second ^ is treated as a literal ^ symbol. You need to remove the second ^ at least. You may also use [^[:alnum:]] to match any non-alphanumeric symbol.

3) The pattern above matches whitespaces, too, so you do not need to alternate it with [[:space]]. ([[:space:]]|[^\p{L}^0-9])* -> [^\p{L}0-9]*.

So, you may use your fixed Regexp.new("\\A[^\\p{L}0-9]*\\d+[^\\p{L}0-9]*\\z") regexp, or use

/\A[^[:alnum:]]*\d+[^[:alnum:]]*\z/.match?(token.downcase)

See the Rubular demo where your sample string is not matched with the regex.

Details:

  • \A - start of a string
  • [^[:alnum:]]* - 0+ non-alphanumeric chars
  • \d+ - 1+ digits
  • [^[:alnum:]]* - 0+ non-alphanumeric chars
  • \z - end of string.
1
Cary Swoveland On

Here are a three ways to do that.

#1 Use a regular expression with a capture group

r = /
    \A                    # match beginning of string
    [^[[:alnum:]]]*       # match 0+ chars other than digits and lc letters
    (\d+)                 # match 1+ digits in capture group 1
    [^[[:alnum:]]]*       # match 0+ chars other than digits and lc letters
    \z                    # match end of string
    /x                    # free-spacing regex definition mode

"$ ^*123@-"[r, 1]         #=> '123'
"$ ^*123@-a?"[r, 1]       #=> nil
"$9^*123@-"[r, 1]         #=> nil

#2 Use a regular expression with \K and a positive lookahead

r = /
    \A                    # match beginning of string
    [^[[:alnum:]]]*       # match 0+ chars other than digits and lc letters
    \K                    # discard all matched so far
    \d+                   # match 1+ digits
    (?=[^[[:alnum:]]]*\z) # match 0+ chars other than digits and lc letters
                          # in a positive lookahead
    /x                    # free-spacing mode

"$ ^*123@-"[r]            #=> '123'
"$ ^*123@-a?"[r]          #=> nil
"$9^*123@-"[r]            #=> nil

Note that we cannot have a positive lookbehind in place of \K as Ruby does not support variable-length lookbehinds.

#3 Use simpler regular expressions together with String methods

def extract(str)
  return nil if str =~ /[[:alpha:]]/
  a = str.scan(/\d+/)
  a.size == 1 ? a.first : nil
end

extract("$ ^*123@-")      #=> '123'
extract("$ ^*123@-a?")    #=> nil
extract("$9^*123@-")      #=> nil