I have a regex to match all m3u8 URL

(https?:\/\/(www\.)?[[email protected]:%._\+~#=]{2,256}\.[a-z]{2,6}\b([[email protected]:%_\+.~#,?&*//=]*)(.m3u8)\b([[email protected]:%_\+.~#,?&//=]*))

This regex matched all m3u8 url with domain, for example: https://mydomain.example/hls/playlist.m3u8

But I have some case need to match with url with the hostname as an IP address like, http://192.168.1.199:8001/hls/playlist.m3u8

I find a regex for only IP matching. How can I combine regex to match all kind of hls url even it use hostname or IP

Thanks

1 Answers

2
Devesh Kumar Singh On

I would suggest using urllib.parse.urlparse for this.

from urllib import parse

p1 = parse.urlparse('https://mydomain.example/hls/playlist.m3u8')
print(p1.netloc)
print(p1.path)
#mydomain.example
#/hls/playlist.m3u8
p2 = parse.urlparse('http://192.168.1.199:8001/hls/playlist.m3u8')
print(p2.netloc)
print(p2.path)
#192.168.1.199:8001
#/hls/playlist.m3u8

The named tuple returned from this function also gives you other information. e.g.

print(p1)
#ParseResult(scheme='https', netloc='mydomain.example', path='/hls/playlist.m3u8', params='', query='', fragment='')