Regular expression

114 views Asked by At

I would like to have a regular expression to get the number between the '_' and the '/' only and I want to match numbers 1 to 9999 and exclude numbers that are only 0’s

catalogue/tipping-the-velvet_999/index.html
catalogue/soumission_998/index.html
catalogue/the-requiem-red_995/index.html
catalogue/the-black-maria-1700-1749_991/index.html
catalogue/olio_984/index.html
catalogue/mesaerion-the-best-science-fiction-stories-1800-1849_983/index.html

Thanks

I'd like to get the list of numbers!

2

There are 2 answers

8
BlueQuark On

To get just the number out of a line, you can use the regex (?<=_)[1-9]\d{0,3}(?=/).

  • (?=_) uses look-around assertions to require that the match be preceded by _ but doesn't include it the result.

  • [1-9] matches a single digit that's not 0.

  • \d matches digits 0 through 9.

  • {0,3} means zero to three characters. So taken together, \d{0,3} will match up to three digits together.

  • (?=/) uses a lookbehind to require that / follows the match, but doesn't include it in the result.

Note that this regex will work for a single line of input; you will want to loop over all the lines to get out all applicable numbers.

0
Ted Lyngmo On

You can capture the number between _ and /:

_([1-9]\d{0,3})\/

Demo

This makes sure that the number starts with [1-9] and is followed by 0-3 of any digits. _0/ will therefore not match.

  • _ - a literal _
  • ( - start of capture group
    • [1-9] - match on digits 1-9
    • \d{0,3} - match on digits 0-9 zero to three times
  • ) - end of capture group
  • \/ - a / (depending on the interface you use, escaping the / with \ may not be needed)