Rails 6- RFC5322 compliant Email validation

430 views Asked by At

This is a PCRE regex. https://regex101.com/r/gJ7pU0/1 that can validate email addresses.

Is there a RFC5322 compliant regex for ruby? Ruby has URI::MailTo::EMAIL_REGEXP, but I do not think it's RFC5322 compliant.

Another post mentioned this 'mail' gem, but I do not see a way to validate email addresses with it.

https://github.com/mikel/mail/tree/6b0ebb142c476bf7c00524effe513a4f151f59ab

PERC RFC5322 Compliant

(?(DEFINE)
    (?<addr_spec> (?&local_part) @ (?&domain) )
    (?<local_part> (?&dot_atom) | (?&quoted_string) | (?&obs_local_part) )
    (?<domain> (?&dot_atom) | (?&domain_literal) | (?&obs_domain) )
    (?<domain_literal> (?&CFWS)? \[ (?: (?&FWS)? (?&dtext) )* (?&FWS)? \] (?&CFWS)? )
    (?<dtext> [\x21-\x5a] | [\x5e-\x7e] | (?&obs_dtext) )
    (?<quoted_pair> \\ (?: (?&VCHAR) | (?&WSP) ) | (?&obs_qp) )
    (?<dot_atom> (?&CFWS)? (?&dot_atom_text) (?&CFWS)? )
    (?<dot_atom_text> (?&atext) (?: \. (?&atext) )* )
    (?<atext> [a-zA-Z0-9!#$%&'*+\/=?^_`{|}~-]+ )
    (?<atom> (?&CFWS)? (?&atext) (?&CFWS)? )
    (?<word> (?&atom) | (?&quoted_string) )
    (?<quoted_string> (?&CFWS)? " (?: (?&FWS)? (?&qcontent) )* (?&FWS)? " (?&CFWS)? )
    (?<qcontent> (?&qtext) | (?&quoted_pair) )
    (?<qtext> \x21 | [\x23-\x5b] | [\x5d-\x7e] | (?&obs_qtext) )
    # comments and whitespace
    (?<FWS> (?: (?&WSP)* \r\n )? (?&WSP)+ | (?&obs_FWS) )
    (?<CFWS> (?: (?&FWS)? (?&comment) )+ (?&FWS)? | (?&FWS) )
    (?<comment> \( (?: (?&FWS)? (?&ccontent) )* (?&FWS)? \) )
    (?<ccontent> (?&ctext) | (?&quoted_pair) | (?&comment) )
    (?<ctext> [\x21-\x27] | [\x2a-\x5b] | [\x5d-\x7e] | (?&obs_ctext) )
    # obsolete tokens
    (?<obs_domain> (?&atom) (?: \. (?&atom) )* )
    (?<obs_local_part> (?&word) (?: \. (?&word) )* )
    (?<obs_dtext> (?&obs_NO_WS_CTL) | (?&quoted_pair) )
    (?<obs_qp> \\ (?: \x00 | (?&obs_NO_WS_CTL) | \n | \r ) )
    (?<obs_FWS> (?&WSP)+ (?: \r\n (?&WSP)+ )* )
    (?<obs_ctext> (?&obs_NO_WS_CTL) )
    (?<obs_qtext> (?&obs_NO_WS_CTL) )
    (?<obs_NO_WS_CTL> [\x01-\x08] | \x0b | \x0c | [\x0e-\x1f] | \x7f )
    # character class definitions
    (?<VCHAR> [\x21-\x7E] )
    (?<WSP> [ \t] )
)
^(?&addr_spec)$
1

There are 1 answers

0
Wiktor Stribiżew On BEST ANSWER

PCRE to Onigmo recursion/subroutine regex conversion is straight-forward:

  • Remove the unsupported (?(DEFINE)...) construct
  • Put all the named groups used to define the consuming pattern at the start of the regex and apply a {0} quantifier to all of them so that they just matched nothing
  • Replace (?&...) to \g<...> syntax (I have just done it in Notepad++ with \(\?&(\w+)\) to replace with \\g<$1>).

The final expression that will work in Ruby looks like

re =/(?<addr_spec> \g<local_part> @ \g<domain> ){0}
(?<local_part> \g<dot_atom> | \g<quoted_string> | \g<obs_local_part> ){0}
(?<domain> \g<dot_atom> | \g<domain_literal> | \g<obs_domain> ){0}
(?<domain_literal> \g<CFWS>? \[ (?: \g<FWS>? \g<dtext> )* \g<FWS>? \] \g<CFWS>? ){0}
(?<dtext> [\x21-\x5a] | [\x5e-\x7e] | \g<obs_dtext> ){0}
(?<quoted_pair> \\ (?: \g<VCHAR> | \g<WSP> ) | \g<obs_qp> ){0}
(?<dot_atom> \g<CFWS>? \g<dot_atom_text> \g<CFWS>? ){0}
(?<dot_atom_text> \g<atext> (?: \. \g<atext> )* ){0}
(?<atext> [a-zA-Z0-9!#$%&'*+\/=?^_`{|}~-]+ ){0}
(?<atom> \g<CFWS>? \g<atext> \g<CFWS>? ){0}
(?<word> \g<atom> | \g<quoted_string> ){0}
(?<quoted_string> \g<CFWS>? " (?: \g<FWS>? \g<qcontent> )* \g<FWS>? " \g<CFWS>? ){0}
(?<qcontent> \g<qtext> | \g<quoted_pair> ){0}
(?<qtext> \x21 | [\x23-\x5b] | [\x5d-\x7e] | \g<obs_qtext> ){0}
# comments and whitespace
(?<FWS> (?: \g<WSP>* \r\n )? \g<WSP>+ | \g<obs_FWS> ){0}
(?<CFWS> (?: \g<FWS>? \g<comment> )+ \g<FWS>? | \g<FWS> ){0}
(?<comment> \( (?: \g<FWS>? \g<ccontent> )* \g<FWS>? \) ){0}
(?<ccontent> \g<ctext> | \g<quoted_pair> | \g<comment> ){0}
(?<ctext> [\x21-\x27] | [\x2a-\x5b] | [\x5d-\x7e] | \g<obs_ctext> ){0}
# obsolete tokens
(?<obs_domain> \g<atom> (?: \. \g<atom> )* ){0}
(?<obs_local_part> \g<word> (?: \. \g<word> )* ){0}
(?<obs_dtext> \g<obs_NO_WS_CTL> | \g<quoted_pair> ){0}
(?<obs_qp> \\ (?: \x00 | \g<obs_NO_WS_CTL> | \n | \r ) ){0}
(?<obs_FWS> \g<WSP>+ (?: \r\n \g<WSP>+ )* ){0}
(?<obs_ctext> \g<obs_NO_WS_CTL> ){0}
(?<obs_qtext> \g<obs_NO_WS_CTL> ){0}
(?<obs_NO_WS_CTL> [\x01-\x08] | \x0b | \x0c | [\x0e-\x1f] | \x7f ){0}
# character class definitions
(?<VCHAR> [\x21-\x7E] ){0}
(?<WSP> [ \t] ){0}
^\g<addr_spec>$/x

See a Ruby test:

p re.match?('[email protected]')           # => true
p re.match?('test@[123.123.123.123')    # => false