I was taking a look at the publicsuffix
library in Go and found it pretty useful in extracting domains out of strings. This is what I have:
package main
import (
"fmt"
"golang.org/x/net/publicsuffix"
)
func main() {
url := "a.very.complex-domain.co.uk"
u, _ := publicsuffix.EffectiveTLDPlusOne(url)
fmt.Printf(u)
}
This works fine yeilding complex-domain.co.uk
as the valid domain. However, the problem I am facing is when any random string is passed to the function (containing a dot), the library gives out a valid domain name anyhow (even if the TLD doesn't exist in the publicsuffix list).
package main
import (
"fmt"
"golang.org/x/net/publicsuffix"
)
func main() {
url := "a.very.complex-domain.someinvalidtld"
u, _ := publicsuffix.EffectiveTLDPlusOne(url)
fmt.Printf(u)
}
Gives: complex-domain.someinvalidtld
My understanding is that the publicsuffix
package assumes that it is a local domain and parses it anyhow. Is there a way to avoid this behavior and extract only valid ones out?
I figured it out, you can easily do it using the same library:
So calling the function like:
Returns:
Invalid
.The logic behind this is: For all TLDs that aren't ICANN managed, if they have a
.
in them, it means that they are privately managed (e.g.blogspot.co.uk
), otherwise it is invalid TLD.