How to use Swift literal regex expressions in switch case pattern statements?

1k views Asked by At

How to use Swift literal regex expressions in switch case pattern statements?

Based on the examples from WWDC 2022 presention slides, the following is expected to compile and run OK:

import Foundation
import RegexBuilder

switch "abc" {
    case /\w+/:
        print("matched!")
    default:
        print("not matched.")
}

However, the following error is produced:

Expression pattern of type Regex<Substring> cannot match values of type String

Can the switch case statement with a Swift regex literal expression be somehow modified to function OK? How would one use the new Swift 5.7 regex capabilties in the switch case pattern statement?

2

There are 2 answers

4
Sweeper On BEST ANSWER

From what I have found, the "matching with regexes in switch statement" feature has not been implemented, because people were arguing about what the exact semantic should be. In case such as

switch "---abc---" {
case /\w+/:
    print("foo")
default:
    print("bar")
}

which branch should the switch statement run? Should it count as a match only if the whole string matches the regex, or is it enough only for a substring of the switched string to match? In other words, is it wholeMatch or firstMatch? See more of the discussion here.

In the end, they were not able to come to a conclusion, and

The proposal has been accepted with modifications (the modification being to subset out ~= for now).

So the ~= operator was not added for Regex<Output>, so you cannot use it in a switch.

You can add it yourself if you want, if you can decide between the two semantics :) For example:

func ~=(regex: Regex<Substring>, str: String) -> Bool {
    // errors count as "not match"
    (try? regex.wholeMatch(in: str)) != nil
}
0
marc-medley On

Can the switch case statement with a Swift regex literal expression be somehow modified to function OK?

Yes, a case let … where … pattern with a /regex/ literal can be used. This approach can also be implemented in a way that expressly avoids potential ~= future ambiguities relative to the standard library. This approach is an alternative to directly defining ~=.

Note that defining your own ~= will not create errors down the road, because overload resolution favors operations defined outside the standard library over those defined in the standard library. It could result in confusion for developers reading your source, however. - Stephen Canon comment -

Discussion…

Match-Part-Or-Whole Example - A fundamental approach where the original regex pattern /^…$/ is used to match a entire line:

extension String {
    func matchFirst(_ regex: Regex<Substring>) -> Bool {
        // errors count as "not match"
        (try? regex.firstMatch(in: self)) != nil
    }
}

switch "---abc---" {
    case let s where s.matchFirst(/^\w+$/):
        print("entire line contains alphanumerics: '\(s)'")
    case let s where s.matchFirst(/\w+/):
        print("alphanumerics found in string: '\(s)'")
    default:
        print("no alphanumerics found")
}

Whole-Match-Only Example - A "whole match only" regex approach where a partial match is not possible:

extension String {
    func matchWhole(_ regex: Regex<Substring>) -> Bool {
        // errors count as "not match"
        (try? regex.wholeMatch(in: self)) != nil
    }
}

switch "---abc---" {
    case let s where s.matchWhole(/\w+/):
        print("all alphanumerics: '\(s)'")
    //case partial match not available. whole or nothing.
    default:
        print("no match for /\\w+/")
}

I ended up using the "classic" Match-Part-Or-Whole Example approach instead of the Whole-Match-Only Example and func ~= approaches for the following reasons:

  • func ~= - could possibly be defined by Swift at some future time. Possible future confusion.
  • Whole-Match-Only Example - does not support both partial and full matches. Less expressive.
  • Match-Part-Or-Whole Example
    • leaves ~= undefined which allows for the possible future definition by Swift. Avoids possible future confusion.
    • does support both partial and full matches. More expressive.
    • ^…$ is expressly stated for a full line match. More readable.

Note: Extending String with both convenience wrappers, such as matchFirst and matchWhole, can allow for either approach to be choosen at the point of use. This approach provides the following benefits:

  • expressive
  • co-locates both choices in the point-of-use autocompletion list
  • avoids the conflict of one vs the other in the lower level extension
  • does not presume any interpretation for the not-yet-officially-defined ~=.
extension String {
    func matchFirst(_ regex: Regex<Substring>) -> Bool {
        // errors count as "not match"
        (try? regex.firstMatch(in: self)) != nil
    }

    func matchWhole(_ regex: Regex<Substring>) -> Bool {
        // errors count as "not match"
        (try? regex.wholeMatch(in: self)) != nil
    }
}

Historic Footnote

The ^ begin-anchor and $ end-anchor syntax has been part of Regular Expressions since the 1970s with qed and ed PDP-7 AT&T Bell Labs Unix editors.

QED Text Editor (1970 Bell Telephone Laboratories technical memorandum)

enter image description here

ed (see man ed or info ed on POSIX and Open Group compliant Unix-like systems)

enter image description here

See also man ed and info ed on modern BSD/Linux/Unix systems. It's still there.

The ^ begin-anchor and $ end-anchor syntax was also carried forward to other Regular Expression enabled software tools, such as sed, g/re/p global regular expression, Perl Compatible Regular Expressions (PCRE) library, and POSIX standard Basic Regular Syntax (BRE).

If /^.$/ pattern is implied and hidden for some compact code convenience then the REGEX expressive capability is reduced.

Seeing, reading, and writing with ^ begin-anchor and $ end-anchor syntax can be natural (and even expected) for an experienced REGEX user.