Is there a way to extract only valid domains from the publicsuffix library?

899 views Asked by At

I was taking a look at the publicsuffix library in Go and found it pretty useful in extracting domains out of strings. This is what I have:

package main

import (
    "fmt"

    "golang.org/x/net/publicsuffix"
)

func main() {
    url := "a.very.complex-domain.co.uk"
    u, _ := publicsuffix.EffectiveTLDPlusOne(url)
    fmt.Printf(u)
}

This works fine yeilding complex-domain.co.uk as the valid domain. However, the problem I am facing is when any random string is passed to the function (containing a dot), the library gives out a valid domain name anyhow (even if the TLD doesn't exist in the publicsuffix list).

package main

import (
    "fmt"

    "golang.org/x/net/publicsuffix"
)

func main() {
    url := "a.very.complex-domain.someinvalidtld"
    u, _ := publicsuffix.EffectiveTLDPlusOne(url)
    fmt.Printf(u)
}

Gives: complex-domain.someinvalidtld

My understanding is that the publicsuffix package assumes that it is a local domain and parses it anyhow. Is there a way to avoid this behavior and extract only valid ones out?

1

There are 1 answers

0
0xInfection On BEST ANSWER

I figured it out, you can easily do it using the same library:

func checkForValidTLD(str string) bool {
    etld, im := publicsuffix.PublicSuffix(str)
    var validtld = false
    if im { // ICANN managed
        validtld = true
    } else if strings.IndexByte(etld, '.') >= 0 { // privately managed
        validtld = true
    }
    return validtld
}

So calling the function like:

if checkForValidTLD("a.very.complex-domain.someinvalidtld") {
    fmt.Println("Valid")
} else {
    fmt.Println("Invalid")
}

Returns: Invalid.

The logic behind this is: For all TLDs that aren't ICANN managed, if they have a . in them, it means that they are privately managed (e.g. blogspot.co.uk), otherwise it is invalid TLD.