IDN homograph attack

Graphic homo homo or graphical attack ( or homo homo graphic or graphical phishing) is a method of spoofing, in which the attacker the similar appearance of various characters used to computer users to fake a false identity, especially in domains. The attacker lures the user to a domain name that looks almost the same as a well-known domain name, but elsewhere leads, for example, to a phishing site.

With the introduction of internationalized domain names is beyond the ASCII character set including a plurality of fonts for domain names available of which contain a number of similar characters. In order to multiply the opportunities for homographic attacks.

  • 4.1 German
  • 4.2 English

Homographs in ASCII

ASCII contains characters that look similar to each other: The digit 0 resembles the letter O and the letter l (lowercase L ) and I ( capital i ) and the number 1 are similar. ( Such mutually similar or identical -looking characters called homoglyphs, they can be used to write homographs, words that look the same and yet not mean the same thing. )

Examples of possible spoofing attacks are the domains G00GLE.COM that GOOGLE.COM looks similar in some fonts, or with large i, instead of the small L, which looks similar to right. PayPal was actually target a phishing attack, in which the domain was used with large i.

In some proportional fonts like Tahoma (default the address bar in Windows XP ) are generated homographs, if you put a c before a j, l or i. The result is similar to cl d, cj resembles g, and ci is similar to a The long s (s ) is easily confused with the f, but it will now be evaluated in URLs as " s".

Homographs in internationalized domain names

In multilingual computer systems logically different characters may look the same. For example, the Unicode character U 0430, Cyrillic small letter a ( " а " ), look the same as the Unicode character U 0061, Latin small letter a ( "a").

The problem arises from the different way how the characters are handled by the consciousness of the user and the computer software. From the user's point of view, the Cyrillic " а " within a Latin string the Latin "a". Between the glyphs for these characters there are in most fonts no difference. However, the computer handles the different characters when processing the character string as an identifier. The adoption of the user that between the visual appearance of the name and the named object would be a one-to -one relationship fails here.

With internationalized domain name is a backwards compatible method available to use the full Unicode character set for domain names. This standard is already widely implemented. However, this system extends the character set of a few dozen to many thousands of characters, so the scope for homographic attacks is increased significantly.

Evgeniy Gabrilovich and Alex Gontmakher from the Technion in Haifa, published in 2001 an essay entitled "The homograph attack". They describe it as a spoofing attack with Unicode URLs. To demonstrate the feasibility of this method of attack, they registered successfully a modification of the domain, containing Russian letters.

That would create such problems was anticipated before the introduction of IDN. There guidelines were issued that should guide the registries, to avoid the problem or decrease. Thus, for example, recommended that the registries should accept only characters from the Latin alphabet and their own country and not the complete Unicode character set. This recommendation, however, was disregarded by major top- level domains.

On February 7, 2005 Slashdot reported that this exploit was disclosed on the ShmooCon hacker conference. The URL http://www.pа, wherein the first a is replaced by a Cyrillic а, directed Web browsers that support IDNA, the bill on the website of the payment service Paypal, while another site was in reality controlled.


The Cyrillic alphabet is most commonly used for homographic attacks. The Cyrillic letters а, с, е, о, р, х and у look almost or totally exactly like the Latin letters a, c, e, o, p, x and y. The Cyrillic З, and Ч б resemble the digits 3, 4 and 6

Italics types generate more possibilities for confusion: дтпи ( дтпи in normal type ) is similar gmnu ( In many fonts, however д is similar to the sign of the partial derivative, ∂ ).

If uppercase letters are considered, then ВНКМТ can be confused with BHKMT, also the large versions of the above-mentioned small Cyrillic homographs.

Non- Russian Cyrillic letters and their permutation counterparts are suitable for һ and h і and i, j and ј, ѕ and s, Ғ and F. ё ї and can be used to simulate ë and ï.


From the Greek alphabet same only the omicron ο and sometimes the ny ν a Latin lowercase letters, as used in URLs. In cursive fonts Latin a is similar to the Greek alpha α.

If even one approximate similarity extended come the Greek letter added εικηρτυωχγ that can be confused with eiknptuwxy. Be used unless uppercase letters, the list expanded considerably: Greek ΑΒΕΗΙΚΜΝΟΡΤΧΥΖ looks identical like latin ABEHIKMNOPTXYZ.

The Greek beta β may be mistaken in some fonts with the German " sharp s " ß. The code page 437 MS- DOS actually uses the ß instead of β. The Greek small sigma ς can be confused with the small Latin C with cedilla ç.

The accented Greek characters όίά seen in many fonts Oia deceptively similar, although the third letter, the Alpha, again only in some cursive fonts the Latin a similar.


Even the Armenian alphabet contains letters that are suitable for homographic attack: ցհոօզս looks like ghnoqu, յ resembles j (although it has no point), and can ք similar to p or f look like, depending on the font used. Two Armenian letters ( Ձշ ) can also be the number 2 look similar, and one ( վ ) sometimes resembles the digit 4

However, the use of the Armenian alphabet is not easy. While most standard fonts include Greek and Cyrillic, but no Armenian characters. Therefore Armenian characters in Windows usually in a special font ( Sylfaen ) are reproduced so that the mixing is visible. In addition, the font contains the Latin g and the Armenian ց are designed different.

The Hebrew alphabet is rarely used for spoofing. Three of his characters are sufficient for: Samech ( ס ) can resemble a o, Waw with diacritical point ( וֹ ) resembles a i, and Chet ( ח ) resembles a n Some Hebrew letters resemble other characters less clear and therefore are more suitable for Foreign branding as for homographic attacks.

Since the Hebrew script is written from right to left, difficulties may arise when they are combined in bidirectional text with characters that are written from left to right.


The simplest measure is the protection that a web browser IDNA and similar functions not supported, or that the user switches off those features of its browser. This may mean that access to websites with Internationalized Domain Names (IDN ) is blocked. Normally the browser allow you to access and provide the URLs in Punycode dar. In both cases, the use of domain names containing non -ASCII characters locked.

Firefox and Opera make IDNs in Punycode is, unless the top-level domain ( TLD) defends homographic attacks from the fact that it restricts the allowable characters in domain names. Both browsers allow the user to manually add TLDs on permitted list.

Internet Explorer 7 allows IDNs, but mix no labels, the writing systems of different languages ​​with each other. Such mixed labels are shown in Punycode. Exceptions are locales where it is common to use mixed ASCII letters with local writing systems.

As an added layer of protection include Internet Explorer 7, Firefox 2.0 and above, and Opera 9:10 phishing filter, try to warn the user if malicious sites are visited.

One possible method of protection, which was proposed in the English language, would be that web browser non-ASCII characters indicate URLs, such as different colored background. This would not protect that a non -ASCII characters will be replaced by another similar non-ASCII characters (eg a Greek ο through a Cyrillic о ). A further solution that avoids this weakness, would be to use a different color for each occurring writing system.

Certain fonts provide homoglyphs is different and can thus help to identify characters that do not belong in a URL. For example, in Courier New, some characters are distinguishable, which look the same in other fonts. However, it is not yet easily possible for the typical user to change the font of the address bar.

In Safari, the approach of the problematic character sets in Punycode is present. This can be changed by setting the settings in the system files of Mac OS X.

With the introduction of country-specific top-level domains ( ccTLD) spoofing is difficult. For example, the future Russian TLD ". Рф " accept and allow no mixing with Latin characters only written in Cyrillic domain names. In general TLDs like. " Com" but the problem persists.