Regex: Tightening up an IPv4 regex to omit ESMTPSA id?

255 views Asked by At

An ESMTPSA id is a string that looks something like:

w12sm4743917pbs.68.2015.06.04.16.21.51

It can appear in the Received: from header in in email, such as in the following example:

Received: from [192.168.0.140] (n11649196059.netvigator.com. [116.49.196.59])
        by mx.google.com with ESMTPSA id w12sm4743917pbs.68.2015.06.04.16.21.51
        for <[email protected]>
        (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
        Thu, 04 Jun 2015 16:21:52 -0700 (PDT)

I have the following regex which works well at extracting IPv4 addess from such a header:

d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}

Problem is, it's also extracting a chunk of the ESMTPSA id: 015.06.04.16. See it in action here.

How would we tighten the regex up so that it only extracts the IPv4 address? Note: the addresses are not always in square brackets, as in the above example. I'm using Python and I know I could use the ipaddress module to validate all matches, but it will be far more convenient for me to not match in the first place.

1

There are 1 answers

6
AudioBubble On BEST ANSWER
[^\.\d]\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}[^\.\d]

and trim 1 symbol from start and end of string (or use capturing groups)

PS or simply use your first regexp after my

PS2 with capturing group

[^\.\d](\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})[^\.\d]

most of regexp tools allow you to get capturing group by number(in order) like \1 or similar