I am trying to extract all domain names out of COM and NAME dns zone file. Those zone files contain all dns entries and there seem to be lack of information about structure of zone files.
Do all domain registered has NS entries? Even those which are not actively used? Which record/records should I use to extract domain names.
Zone files are very large and sorting them would be stupid idea. So if I can use one DNS record type to extract domain name than it would be easier. I found this python script(I dont know python) on GitHub which uses only NS entries. Is it correct logically?
Someone with experience please comment.
The format of the DNS zone file is defined in RFC 1035 (section 5) and RFC 1034 (section 3.6.1). You can find many details on Wikipedia: https://en.wikipedia.org/wiki/Zone_file
It contains only the published domain names that is those having at least one nameserver and not being under
clientHoldorserverHoldstatuses (see http://www.icann.org/epp#clientHold and http://www.icann.org/epp#serverHold), which means in short it is NOT all domain names registered..COMzone file is huge indeed. In any case, you need to match onNSrecords lines and deduplicate domain names. There are multiple strategies to do that, depending on your constraints.Note that many providers on line already do this work for you and can provide directly the domain names if this is all you are interested in. Some may also provide differential content, one day from the previous.