I have set up a simple domain in my Azure subscription by creating a domain controller in an Azure VM, with all of the associated DNS setup and following documented best practices. This is a cloud-only domain on a cloud-only vnet; there is no on-premises connectivity. I have provisioned and joined a handful of VMs to the domain. Now, when I provision new VMs, they have trouble joining the domain (often failing to join at all) and DNS lookups from these machines often times out, especially to internet addresses. How can I fix this?
Details
I have set up a domain controller on an Azure VM following the practices and steps in "Install a new Active Directory forest on an Azure virtual network" and "Guidelines for Deploying Windows Server Active Directory on Azure Virtual Machines", with the exception that I did not put the AD database on a separate data disk. In addition, I have added 168.63.129.16 as a second DNS address (the first address is the internal vnet address of the DC, which I have made static using Set-AzureStaticVNetIP) in the Virtual Network settings so that the machines on the domain can reach the internet.
I use the PowerShell cmdlets to provision new machines and have them automatically joined to the domain using the -WindowsDomain switch and associated parameters of Add-AzureProvisioningConfig when creating the VMs. I have provisioned the DC in one cloud service, and all other machines in another cloud service. All are on the same vnet subnet, and all of this is in one affinity group. I have provisioned and joined about 15 machines, about ten of which are still running (others deleted).
Usually provisioning a new VM takes about 11-12 minutes. Now I'm seeing that it takes upwards of 30-35, and upon completion, the machine failed to join the domain. DNS lookups across the board are slow and often time out (especially for internet addresses), and on these new machines that were not able to join the domain, often fail completely. Pinging the DC from these machines fails, while on machines that successfully joined the domain earlier, it succeeds.
I am not sure if the number of machines on the domain/vnet/cloud service/subscription are the cause of this problem, but I didn't see this problem until I had been using the domain for a while and spun up a number of machines.
One of the more common causes could be your AD DNS is returning an IP that cannot be resolved internally to join the domain. When you do an nslookup on yourdomain.local, does it respond with only IPs that can resolve on the internal, private network?