I faced this so called "homograph attack" and I want to reject domains where decoded punycode visually seems to be alphanumeric only. For example, www.xn--80ak6aa92e.com will display www.apple.com in browser (Firefox). Domains are visually the same, but character set is different. Chrome already patched this and browser display the punycode.
I have example below.
#!/usr/bin/perl
use strict;
use warnings;
use Net::IDN::Encode ':all';
use utf8;
my $testdomain = "www.xn--80ak6aa92e.com";
my $IDN = domain_to_unicode($testdomain);
my $visual_result_ascii = "www.apple.com";
print "S1: $IDN\n";
print "S2: $visual_result_ascii";
print "MATCH" if ($IDN eq $visual_result_ascii);
Visually are the same, but they won't match. It is possible to compare an unicode string ($IDN) against an alphanumeric string, visually the same?
After some research and thanks to your comments, I have a conclusion now. The most frequent issues are coming from Cyrillic. This set contains a lot of visually-similar to Latin characters and you can do many combinations.
I have identified some scammy IDN domains including these names:
Maybe here, with this font, you can see a difference, but in browser is absolutely no visual difference.
Consulting https://en.wikipedia.org/wiki/Cyrillic_script_in_Unicode I was able to create a table with 12 visually similar characters.
Update: I found 4 more Latin-like characters in Cyrillic charset, 16 in total now.
It is possible to create many combinations between these, to create IDNs 100% visually-similar to legit domains.
The problem is happening with second level domain. Extensions can also be IDN, but they are verified, can not be spoofed and not subject of this issue. Domain registrar will check if all letters are from the same set. IDN will not be accepted if you have a mix of Latin,non-Latin characters. So, extra validation is pointless.
My idea is simple. We split the domain and only decode SLD part, then we match against a visually-similar Cyrillic list. If all letters are visually similar to Latin, then result is almost sure scam.