Regex: How to expand regex for domain name to include domains of type a.b.c.d?

300 views Asked by At

The following regex picks up most mail server names of the form mail.example.com

([a-zA-Z0-9-]+\.{1,}[a-zA-Z0-9-]+\.[a-zA-Z0-9-]{2,})

as shown here.

How do we expand it such that it matches domains with one (or more) additional subdomains e.g.

b-app05-06.boldchat.com
ns126a.ba1.enops.net
NHQSDFEXCHUB01.nam.coair.com
ncsmcexchub01.nam.coair.com
4

There are 4 answers

4
Wiktor Stribiżew On BEST ANSWER

You can enclose the last subpattern into a non-capturing group and set a + quantifier:

\b[\w-]+(?:\.[\w-]+){2,}\b

EXPLANATION:

  • \b - Word boundary
  • [\w-]+ - A character class that matches an alphanumeric or a hyphen
  • (?:\.[\w-]+){2,} - a non-capturing group that matches 2 or more sequences of a literal dot and 1 or more alphanumeric characters or hyphen
  • \b - Word boundary

See demo

2
Tensibai On

you can take advantage of regex recursion if your engine support it (PCRE compatible one usually) Demo here

This regex would work for any number of subdomains and allow you to capture the inner domain.

(([\w-]+)[.](\w{2,}$|(?1))) in details:

  • (([\w-]+)[.] start the whole capture for recursion, then start the capture for the leftmost subdomain followed by a dot.
  • (\w{2,}$|(?1))) Alternation, try to match the tdl (end of match) or repeat the pattern.

You have the host in second capture group and it's domain in the third capture group. and the whole match in the first capture group (see the substitution pane in the demo)

2
Mathieu David On

You can do it shorter:

((?:[a-zA-Z0-9-]+\.)+[a-zA-Z0-9-]{2,})$

Demo

3
sp00m On

First, here is how to match one domain (given your examples):

[a-zA-Z0-9]+(?:-[a-zA-Z0-9]+)*

A tld gets matched with:

[a-z]{2,}

Now, you can have several domains separated by dots, followed by a tld:

((?:[a-zA-Z0-9]+(?:-[a-zA-Z0-9]+)*\.)+[a-z]{2,})

Regular expression visualization

Debuggex Demo


If you need to match domains composed by at least 2 subdomains + tld:

((?:[a-zA-Z0-9]+(?:-[a-zA-Z0-9]+)*\.){2,}[a-z]{2,})

Regular expression visualization

Debuggex Demo