Regex won't compile because of 'unclosed character class' in named capture group

400 views Asked by At

I am getting "Error: unclosed character class" in a Rust regex. Testing the regex using an online Regex tester with PCRE compliant regexes works fine, but using the regex crate on the Rust Playground gives an error.

The character class must include a minus sign. I tried putting the minus sign in first position, last position and leaving it out altogether, but always get an error.

For most of the expected inputs, the source string will be "op(number)" for some op and some non-negative integer. For a few, I expect "op(number/number/number)".

If there is a superior way to extract the named captures, I am all ears.

use lazy_static::lazy_static;
use regex::Regex;

fn main() {
    lazy_static! {
        static ref FANCY_OPCODE_RE: Regex = Regex::new(r"(?x)
            ^                              # Match start of string
            (?P<opname>[-a-zA-Z#+]+)       # Match abbreviated name of OpCode as 'opname'
            \(                             # Open parentheses
            (?P<arg1>[0-9]+)               # Match first number as 'arg1'
            (/                             # Delimiter
            (?P<arg2>[0-9]+)               # Optionally match second number as 'arg2'
            /                              # Delimiter
            (?P<arg3>[0-9]+))?             # Optionally match third number as 'arg3'
            \)                             # Closing parenthesis
            $                              # Match end of string
        ").unwrap();
    }
    let s = "+loop(3)";
    let opname: String; 
    let arg1: String;
    let arg2: String;
    let arg3: String;
    match FANCY_OPCODE_RE.captures(s) {
        Some(cap) => { 
            opname = format!("{:?}", cap.name("opname")); 
            arg1 = format!("{:?}", cap.name("arg1"));
            arg2 = format!("{:?}", cap.name("arg2"));
            arg3 = format!("{:?}", cap.name("arg3"));
        },
        None => { 
            opname = "No match".to_string(); 
            arg1 = String::new();
            arg2 = String::new();
            arg3 = String::new();
        }
    }

    println!("opname = {}, arg1 = {}, arg2 = {}, arg3 = {}", opname, arg1, arg2, arg3);
}

Here is the error message:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Syntax(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
regex parse error:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 1: (?x)
 2:             ^                              # Match start of string
 3:             (?P<opname>[-a-zA-Z#+]+)       # Match abbreviated name of OpCode as 'opname'
                           ^^
 4:             \(                             # Open parentheses
 5:             (?P<arg1>[0-9]+)               # Match first number as 'arg1'
 6:             (/                             # Delimiter
 7:             (?P<arg2>[0-9]+)               # Optionally match second number as 'arg2'
 8:             /                              # Delimiter
 9:             (?P<arg3>[0-9]+))?             # Optionally match third number as 'arg3'
10:             \)                             # Closing parenthesis
11:             $                              # Match end of string
12:         
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
error: unclosed character class
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
)', src/main.rs:17:12
1

There are 1 answers

1
Shepmaster On BEST ANSWER

When debugging a problem, it's useful to create a minimal, reproducible example. By deleting parts of your regex that don't cause the problem, you can quickly reduce to:

Regex::new(r"(?x)(?P<opname>[-a-zA-Z#+]+)").unwrap();

The problem is that you have included the comment character # inside your regex. Escape it:

[-a-zA-Z\#+]