Access an optional capture by name when using Swift Regex Builder

768 views Asked by At

I'm just getting started with regular expressions and Swift Regex, so a heads up that my terminology my be incorrect. I have boiled this problem down to a very simple task:

I have input lines that have either just one word (a name) or start with the word "Test" followed by one space and then a name. I want to extract the name and also be able to access - without using match indices - the match to "Test " (which may be nil). Here is code that better describes the problem:

import RegexBuilder

let line1 = "Test John"
let line2 = "Robert"

let nameReference = Reference(String.self)
let testReference = Reference(String.self)

let regex = Regex {
    Optionally {
        Capture(as:testReference) {
            "Test "
        } transform : { text in
            String(text)
        }
    }
    Capture(as:nameReference) {
        OneOrMore(.any)
    } transform : { text in
        String(text)
    }
}

if let matches = try? regex.wholeMatch(in: line1) { // USE line1 OR line2 HERE
    let theName = matches[nameReference]
    print("Name is \(theName)")
    // using index to access the test flag works fine for both line1 and line2:
    if let flag = matches.1, flag == "Test " {
        print("Using index: This is a test line")
    } else {
        print("Using index: Not a test line")
    }
    // but for line2, attempting to access with testReference crashes:
    if matches[testReference] == "Test " { // crashes for line2 (not surprisingly)
        print("Using reference: This is a test line")
    } else {
        print("Using reference: Not a test line")
    }
}

When regex.wholeMatch() is called with line1 things work as expected with output:

Name is John
Using index: This is a test line
Using reference: This is a test line

but when called with line2 it crashes with a SIGABRT and output:

Name is Robert
Using index: Not a test line
Could not cast value of type 'Swift.Optional<Swift.Substring>' (0x7ff84bf06f20) to 'Swift.String' (0x7ff84ba6e918).

The crash is not surprising, because the Capture(as:testReference) was never matched.

My question is: is there a way to do this without using match indices (matches.1)? An answer using Regex Builder would be much appreciated:-)

The documentation says Regex.Match has a subscript(String) method which "returns nil if there's no capture with that name". That would be ideal, but it works only when the match output is type AnyRegexOutput.

2

There are 2 answers

1
nd. On BEST ANSWER

While I would prefer Tom Harrington's solution for this particular use case, the API supports optional references by setting the type of the reference to an Optional itself:

let nameReference = Reference(String.self)
let testReference = Reference(String?.self)  // The String? is crucial here

let regex = Regex {
    Optionally {
        Capture(as:testReference) {
            "Test "
        } transform : { text in
            String(text)
        }
    }
    Capture(as:nameReference) {
        OneOrMore(.any)
    } transform : { text in
        String(text)
    }
}

if let matches = try? regex.wholeMatch(in: line1) {
    if matches[testReference] == "Test " { // this does not cash, but returns a String?
        print("Using reference: This is a test line")
    } else {
        print("Using reference: Not a test line")
    }
}

Note: if you want to have a reference to an optional Substring (Reference(Substring?.self)), then you must use Capture(as:_:transform:), because otherwise the compiler complains that Substring? and Substring are not equivalent.

2
Tom Harrington On

I don't think you can get away with not using indexes, or at least code that knows the index but might hide it. Regular expression parsing works like that in any language, because it's always assumed that you know the order of elements in the expression.

For something like this, your example could be simplified to something like

let nameRegex = Regex {
    ZeroOrMore("Test ")
    Capture { OneOrMore(.anyNonNewline) }
}

if let matches = try? nameRegex.wholeMatch(in: line2) {
    let (_, name) = matches.output
    print("Name: \(name)")
}

That works for both of your sample lines. The let (_, name) doesn't use a numeric index but it's effectively the same thing since it uses index 1 as the value for name.

If your data is as straightforward as these examples, a regular expression may be overkill. You could work with if line1.hasPrefix("Test ") to detect lines with Test and then drop the first 5 characters, for example.