From Wikipedia, the free encyclopedia
An internationalized domain name
(IDN) is an Internet domain name that contains at least one
label that is displayed in software applications, in whole or in
part, in a language-specific script or alphabet, such as Chinese, Russian or the Latin-based languages with diacritics, such as French. These
writing
systems are encoded by computers in multi-byte Unicode. Internationalized domain names are
stored in the Domain Name System as ASCII strings using Punycode transcription.
The Domain Name System, which performs a
lookup service to translate user-friendly names into network
addresses for locating Internet resources, is restricted to the use
of ASCII characters, a technical limitation that initially set the
standard for acceptable domain names. The internationalization of
domain names is a technical solution to translate names written in
language-native scripts into an ASCII text representation that is
compatible with the Domain Name System. Internationalized domain
names can only be used with applications that are specifically
designed for such use, and they require no changes in the
infrastructure of the Internet.
IDN was originally proposed in December 1996 by Martin Dürst[1][2] and
implemented in 1998 by Tan Juay Kwang and Leong Kok Yong under the
guidance of T.W. Tan. After much debate and many competing
proposals, a system called Internationalizing Domain Names in
Applications (IDNA) [3] was
adopted as a standard, and has been implemented in several top-level
domains.
In IDNA, the term internationalized domain name means
specifically any domain name consisting only of labels to which the
IDNA ToASCII algorithm (see below) can be successfully applied. In
March 2008, the IETF formed a new IDN working group to
update[4] the
current IDNA protocol.
In October 2009, the Internet
Corporation for Assigned Names and Numbers (ICANN) approved the
creation of country code top-level
domains (ccTLDs) in the Internet that use the IDNA standard for
native language scripts.[5][6]
Internationalizing
Domain Names in Applications
Internationalizing Domain Names in Applications (IDNA) is a
mechanism defined in 2003 for handling internationalised domain
names containing non-ASCII
characters. These names are typically written in languages or
scripts which do not use the Latin alphabet: Arabic, Hangul, Hiragana and Kanji for instance. Although the Domain Name
System supports non-ASCII characters, applications such as e-mail and web browsers restrict the characters which
can be used as domain names for purposes such as a hostname. Strictly speaking
it is the network protocols these applications use that have
restrictions on the characters which can be used in domain names,
not the applications that have these limitations or the DNS itself.
To retain backwards compatibility with the installed base the IETF
IDNA Working Group decided that internationalised domain names
should be converted to a suitable ASCII-based form that could be
handled by web
browsers and other user applications. IDNA specifies how this
conversion between names written in non-ASCII characters and their
ASCII-based representation is performed.
An IDNA-enabled application is able to convert between the
internationalised and ASCII representations of a domain name. It
uses the ASCII form for DNS lookups but can present the
internationalised form to users who presumably prefer to read and
write domain names in non-ASCII scripts such as Arabic or Hiragana.
Applications that do not support IDNA will not be able to handle
domain names with non-ASCII characters, but will still be able to
access such domains if given the (usually rather cryptic) ASCII
equivalent.
ICANN issued guidelines for
the use of IDNA in June 2003, and it was already possible to
register .jp domains using this
system in July 2003 and .info[7] domains
in March 2004. Several other top-level domain registries started
accepting registrations in 2004 and 2005. IDN Guidelines were first
created[8] in June
2003, and have been updated[9] to
respond to phishing
concerns in November 2005. An ICANN working group focused on
country code domain names at the top level was formed in November
2007[10] and
promoted jointly by the country code supporting organization and
the Governmental Advisory Committee.
Mozilla 1.4, Netscape
7.1, Opera 7.11 were among the first
applications to support IDNA. A browser plugin is available for
Internet Explorer 6 to provide IDN support. Internet Explorer
7.0[11][12] and
Windows Vista's
URL APIs provide native support for IDN.[13]
ToASCII
and ToUnicode
The conversions between ASCII and non-ASCII forms of a domain
name are accomplished by algorithms called ToASCII and ToUnicode.
These algorithms are not applied to the domain name as a whole, but
rather to individual labels. For example, if the domain name is
www.example.com, then the labels are www,
example, and com. ToASCII or ToUnicode are
applied to each of these three separately.
The details of these two algorithms are complex, and are
specified in RFC 3490. The following gives
an overview of their function.
ToASCII leaves unchanged any ASCII label, but will fail if the
label is unsuitable for the Domain Name System. If given a label
containing at least one non-ASCII character, ToASCII will apply the
Nameprep algorithm, which
converts the label to lowercase and performs other normalization,
and will then translate the result to ASCII using Punycode[14]
before prepending the four-character string "xn--".[15] This
four-character string is called the ASCII Compatible Encoding
(ACE) prefix, and is used to distinguish Punycode encoded
labels from ordinary ASCII labels. The ToASCII algorithm can fail
in several ways; for example, the final string could exceed the
63-character limit of a DNS name. A label for which ToASCII fails
cannot be used in an internationalized domain name.
The function ToUnicode reverses the action of ToASCII, stripping
off the ACE prefix and applying the Punycode decode algorithm. It
does not reverse the Nameprep processing, since that is merely a
normalization and is by nature irreversible. Unlike ToASCII,
ToUnicode always succeeds, because it simply returns the original
string if decoding fails. In particular, this means that ToUnicode
has no effect on a string that does not begin with the ACE
prefix.
Example
of IDNA encoding
IDNA encoding may be illustrated using the example domain
Bücher.ch. “Bücher” is German for “books”, and .ch is the ccTLD of Switzerland. This
domain name has two labels, Bücher and ch. The
second label is pure ASCII, and is left unchanged. The first label
is processed by Nameprep to give bücher, and then
converted to Punycode to result in bcher-kva. It is then
prepended with xn-- to produce xn--bcher-kva. The
final domain suitable for use in the DNS is therefore
xn--bcher-kva.ch.
Top-level domain
implementation
The ICANN board approved the
establishment of an internationalized top-level domain name working
group within the Country Code Names Supporting Organisation (ccNSO)
in December 2006.[16] They
resolved in June 2007 inter alia to proceed and asked the
IDNC Working Group to prepare a proposal, which the group delivered
in June 2008, "to recommend mechanisms to introduce a limited
number of non-contentious IDN ccTLDs, associated with the ISO
3166-1 two-letter codes in a short time frame to meet near term
demand." The group proposed a methodology using ICANN's Fast
Track Process[17] based
on the ICANN charter to work with the Internet Assigned
Numbers Authority (IANA): 1) Identify technical basis of the
TLD strings and country code specific processes, select IDN ccTLD
personnel and authorities, and prepare documentation; 2) Perform
ICANN due diligence process for technical proposal and publish
method; 3) Enter delegation process within established IANA
procedures.
Starting November 16, 2009, nations and territories may apply
for IDN ccTLDs, which may be expected to be operational in
mid-2010.[6]
Non-Latin
alphabet scripts are used by more than half of the world's 1.6
billion Internet users.[6]
ICANN expects that Arabic, Chinese, and Russian domains are likely
to be the first implementations.[6]
... .مصر .Miṣr Egypt (.eg)
Timeline
- 12/1996: Martin Dürst's original Internet Draft proposing UTF5
(the first example of what is known today as an ASCII-compatible
encoding (ACE)) – UTF-5 was first defined by Martin Dürst at the
University of Zürich in [1] [2] [3]
- 03/1998: Early Research on IDN at National University of
Singapore (NUS), Center for Internet Research (formerly Internet
Research and Development Unit – IRDU) led by Prof. Tan Tin Wee (IDN
Project team – Lim Juay Kwang and Leong Kok Yong) and subsequently
continued under a team at Bioinformatrix Pte. Ltd. (BIX Pte. Ltd.)
– an NUS spin-off company led by Prof. S. Subbiah.
- 07/1998: Geneva INET'98 conference with a BoF discussion on
iDNS and APNG General Meeting and Working Group meeting.
- 07/1998: Asia Pacific Networking Group (APNG, now still in
existence [4] and distinct from a gathering known as
APSTAR [5]) iDNS Working Group formed. [6]
- 10/1998: James
Seng was recruited to lead further IDN development at BIX Pte.
Ltd. by Prof. S. Subbiah.
- 02/1999: iDNS Testbed launched by BIX Pte. Ltd. under the
auspices of APNG with participation from CNNIC, JPNIC, KRNIC, TWNIC,
THNIC, HKNIC and SGNIC led by James Seng [7]
- 02/1999: Presentation of Report on IDN at Joint APNG-APTLD
meeting, at APRICOT'99
- 03/1999: Endorsement of the IDN Report at APNG General Meeting
1 March 1999.
- 06/1999: Grant application by APNG jointly with the Centre for
Internet Research (CIR), National University of Singapore, to the
International Development Research Center (IDRC), a Canadian
Government funded international organisation to work on IDN for
IPv6. This APNG Project was funded under the Pan Asia R&D Grant
administered on behalf of IDRC by the Canadian Committee on
Occupational Health and Safety (CCOHS). Principal Investigator: Tan
Tin Wee of National University of Singapore. [8]
- 07/1999 Tout, Walid R. (WALID Inc.) Filed IDNA patent
application number US1999000358043 Method and system for
internationalizing domain names. Published 2001-01-30 [9]
- 07/1999: [10]; Renewed 2000
[11] Internet Draft
on UTF5 by James Seng, Martin Dürst and Tan Tin Wee.
- 08/1999: APTLD and APNG forms a working group to look into IDN issues
chaired by Kilnam Chon. [12]
- 10/1999: BIX Pte. Ltd. and National University of Singapore
together with New York Venture Capital investors, General Atlantic Partners, spun-off the IDN
effort into 2 new Singapore companies – i-DNS.net International
Inc. and i-Email.net Pte. Ltd. that created the first commercial
implementation of an IDN Solution for both domain names and IDN
email addresses respectively.
- 11/1999: IETF IDN Birds-of-Feather in Washington was
initiated by i-DNS.net at the request of IETF officials.
- 12/1999: i-DNS.net InternationalPte. Ltd. launched the first
commercial IDN. It was in Taiwan and in Chinese characters under the
top-level IDN TLD ".gongsi" (meaning loosely ".com") with
endorsement by the Minister of Communications of Taiwan and some
major Taiwanese ISPs with reports of over 200 000 names sold in a
week in Taiwan, Hong Kong, Singapore, Malaysia, China, Australia and
USA.
- Late 1999: Kilnam Chon initiates Task Force on IDNS which led
to formation of MINC, the Multilingual Internet Names Consortium.
[13]
- 01/2000: IETF IDN Working Group formed chaired by James Seng and Marc
Blanchet
- 01/2000: The second ever commercial IDN launch was IDN TLDs in
the Tamil Language, corresponding to .com, .net, .org, and .edu.
These were launched in India with IT Ministry support by i-DNS.net
International.
- 02/2000: Multilingual Internet Names Consortium(MINC) Proposal
BoF at IETF Adelaide. [14]
- 03/2000: APRICOT 2000 Multilingual DNS session [15]
- 04/2000: WALID Inc. (with IDNA patent pending application
6182148) started Registration & Resolving Multilingual Domain
Names.
- 05/2000: Interoperability Testing WG, MINC meeting. San
Francisco, chaired by Bill Manning and Y. Yoneya 12 May 2000. [16]
- 06/2000: Inaugural Launch of the Multilingual Internet Names
Consortium (MINC) in Seoul [17] to drive the
collaborative roll-out of IDN starting from the Asia Pacific. [18]
- 07/2000: Joint Engineering TaskForce (JET) initiated in
Yokohama to study technical issues led by JPNIC (K.Konishi)
- 07/2000: Official Formation of CDNC Chinese Domain Name
Consortium to resolve issues related to and to deploy Han
Character domain names, founded by CNNIC, TWNIC, HKNIC and MONIC
in May 2000. [19] [20]
- 03/2001: ICANN Board IDN
Working Group formed
- 07/2001: Japanese Domain Name Association : JDNA Launch
Ceremony (July 13, 2001) in Tokyo, Japan.
- 07/2001: Urdu Internet Names System (July 28, 2001) in
Islamabad, Pakistan, Organised Jointly by SDNP and MINC. [21]
- 07/2001: Presentation on IDN to the Committee Meeting of the
Computer Science and Telecommunications Board, National Academies
USA (JULY 11–13, 2001) at University of California School of
Information Management and Systems, Berkeley, CA. [22]
- 08/2001: MINC presentation and outreach at the Asia Pacific
Advanced Network annual conference, Penang, Malaysia 20 August
2001
- 10/2001: Joint MINC-CDNC Meeting in Beijing 18–20 October
2001
- 11/2001: ICANN IDN Committee
formed
- 12/2001: Joint ITU-WIPO Symposium on Multilingual Domain Names
organised in association with MINC, 6–7 Dec 2001, International
Conference Center, Geneva.
- 01/2003: Free implementation of StringPrep, Punycode, and IDNA
release in GNU Libidn.
- 03/2003: Publication of RFC 3454, RFC 3490, RFC 3491 and RFC 3492
- 06/2003: Publication of ICANN IDN Guidelines for
registries Adopted by .cn, .info, .jp, .org, and .tw
registries.
- 05/2004: Publication of RFC 3743, Joint Engineering
Team (JET) Guidelines for Internationalized Domain Names (IDN)
Registration and Administration for Chinese, Japanese, and Korean
- 03/2005: First Study Group 17 of ITU-T meeting on
Internationalized Domain Names [23]
- 05/2005: .IN ccTLD (India) creates expert IDN Working Group to
create solutions for 22 official languages
- 04/2006: ITU Study Group 17 meeting in Korea gave final
approval to the Question on Internationalized Domain Names [24]
- 06/2006: Workshop on IDN at ICANN meeting at Marrakech,
Morocco
- 11/2006: ICANN GNSO IDN Working Group created to discuss policy
implications of IDN TLDs. Ram Mohan elected Chair of the IDN
Working Group.
- 12/2006: ICANN meeting at São Paulo discusses status of lab
tests of IDNs within the root.
- 01/2007: Tamil and Malayalam variant table work completed by
India's C-DAC and Afilias
- 03/2007: ICANN GNSO IDN Working Group completes work, Ram Mohan
presents report at ICANN Lisboa meeting.[18]
- 10/2007: Eleven IDNA top-level domains were added to the root
nameservers in order to evaluate the use of IDNA at the top
level of the DNS.[19][20]
- 01/2008: ICANN: Successful Evaluations of .test IDN TLDs [21]
- 04/2008: IETF IDNAbis WG chaired by Vint Cerf continues the
work to update IDNA [22]
- 06/2008: ICANN board votes to develop final fast-track
implementation proposal for a limited number of IDN ccTLDS.[23]
- 10/2008: ICANN Seeks Interest in IDN ccTLD Fast-Track Process
[24]
- 9/2009: ICANN puts IDN ccTLD proposal on agenda for Seoul
meeting in October 2009[25]
- 10/2009: ICANN approves the registration of IDN names in the
root of the DNS through the IDN ccTLD Fast-Track process at its
meeting in Seoul, Oct. 26–30, 2009.[26]
- .مصر .Miṣr Egypt (.eg)
Top-level
domains known to accept IDN registration
- .ar: (á, à, â, é, è, ê, í, ì, ó,
ò, ô, ú, ü, ñ, ç) Starting in 2009. See FAQ
- .ac: see details
- .ae
- .at: see details
- .bd
- .bg in addition to ASCII, only
Cyrillic letters from the Bulgarian language are accepted
- .biz NeuLevel/NeuStar supports
Chinese, Danish, Finnish, German, Hungarian, Icelandic, Japanese,
Korean, Latvian, Lithuanian, Polish, Portuguese, Norwegian,
Spanish, Swedish IDN in .biz:[27]
- .br (May 9, 2005) for Portuguese
(Brazilian) names: see details
- .cat (February 14, 2006) for
Catalan names: see details
- .com see details
- .ch (March 1, 2004)
- .cl (September 21, 2005), (á, é,
í, ó, ú, ü, ñ): see details
- .cn: Simplified Chinese
characters, see details
- .de (March 1, 2004): see details
- .dk (January 1, 2004), (æ, ø, å,
ö, ä, ü, & é): see details
- .es (October 2, 2007), (á, à, é,
è, í, ï, ó, ò, ú, ü, ñ, ç, l·l): see details
- .eu (June 26, 2009) [28][29]
- .fi (September 1, 2005): see details
- .gr (July 4, 2005) for Greek
names: see details
- .hk (March 8, 2007) for Chinese characters: see details
- .hu
- .id Indonesia
- .info (March 19, 2003): see
details
- .io: see details
- .ir: see details
- .is (July 1, 2004): see details
- .jp (July 2003), for Japanese
characters (Kanji, hiragana & katakana)
- .kr (August 2003), for Korean
characters
- .li (March 1, 2004)
- .lt (March 30, 2003), (ą, č, ę,
ė, į, š, ų, ū, ž): see details
- .lv (2004): see details
- .museum (January 20,
2004): see details
- .mn Mongolian
- .net: see details
- .no (February 9, 2004): see details
- .nu: see details
- .org (January 18, 2005): see
details
- .pe (December 8, 2007): see details
- .pl (September 11, 2003): see details
- .pt (July 1, 2005) for
Portuguese characters
- .se (October 2003), for Swedish
characters, summer 2007 also for Finnish, Meänkieli, Romani, Sami, and Yiddish: see details
- .sh: see details
- .su (April 28, 2008) see details
- .tm: see details
- .tr (November 14, 2006): see details
- .tw Traditional Chinese characters: see details
- .vn Vietnamese: see details
- .ws
Non-IDNA or non-ICANN registries that support non-ASCII domain
names
There are other registries that support non-ASCII domain names.
The company ThaiURL.com in Thailand supports .com registrations via
its own modified domain name system, ThaiURL. Because these companies, and other
organizations that offer modified DNS systems, do not subject
themselves to ICANN's control,
they must be regarded as alternate DNS
roots. Domains registered with them will therefore not be
supported by most Internet service providers,
and as a result most users will not be able to look up such domains
without manually configuring their computers to use the alternate
DNS.
ASCII
spoofing concerns
The use of Unicode in domain names makes it potentially easier
to spoof web
sites visited by World Wide Web users as the visual
representation of an IDN string in a web browser may appear
identical to another, depending on the font used. For example,
Unicode character U+0430, Cyrillic small letter a, can look
identical to Unicode character U+0061, Latin small letter a, used
in English.
In December 2001 Evgeniy Gabrilovich and Alex
Gontmakher, both from the Technion Institute of Technology in Israel,
published a paper titled "The Homograph Attack",[30] which
described an attack that used Unicode URLs to spoof a website URL.
To prove the feasibility of this kind of attack, the researchers
successfully registered a variant of the domain name
microsoft.com which incorporated Russian language
characters.
These kind of problems were anticipated before IDN was
introduced, and guidelines were issued to registries to try to
avoid or reduce the problem. For example, it was advised that
registries only accept characters from the Latin alphabet and that
of their own country, not all of Unicode characters, but this
advice was neglected by major TLDs.
On February 7, 2005, Slashdot reported that this exploit was
disclosed at the hacker conference Shmoocon.[31] Web
browsers supporting IDNA appeared to direct the URL
http://www.pаypal.com/, in which the first a character is
replaced by a Cyrillic а, to the site of the well known
payment site Paypal, but actually led to a spoofed web
site with different content.
Starting with version 7, Internet Explorer was capable of
using IDNs, but it imposes restrictions on displaying non-ASCII
domain names based on a user-defined list of allowed languages and
provides an anti-phishing filter that checks suspicious Web sites
against a remote database of known phishing sites.
On February 17, 2005, Mozilla developers announced that the next
software version still has IDN support enabled, but displaying the
Punycode URLs instead,
thus thwarting some attacks exploiting similarities between ASCII
and non-ASCII characters, while still permitting access to web
sites in an IDN domain.
Since then, both Mozilla and Opera have announced that they will
be using per-domain whitelists to selectively switch on IDN display
for domain run by registries which are taking appropriate homograph spoofing attack precautions.[32] As of
September 9, 2005, the most recent version of Mozilla Firefox
as well as the most recent Internet Explorer display the spoofed
Paypal URL as "http://www.xn--pypal-4ve.com/", clearly
different from the original.
Safari's approach is to render
problematic character sets as Punycode. This can be changed by altering the
settings in Mac OS X's system files.[33]
See also
References
- ^
Dürst, Martin J. (December 10, 1996). "Internet Draft:
Internationalization of Domain Names". The Internet Engineering
Task Force (IETF), Internet Society (ISOC).
http://tools.ietf.org/html/draft-duerst-dns-i18n-00.txt. Retrieved
2009-10-31.
- ^
Dürst, Martin J. (December 20, 1996).
"URLs and
internationalization". World Wide Web Consortium. http://lists.w3.org/Archives/Public/uri/1996Dec/0038.html. Retrieved
2009-10-30.
- ^
RFC 3490, IDN in
Applications, Faltstrom, Hoffman, Costello, Internet
Engineering Task Force (2003)
- ^
John Klensin (2008). "Internationalized Domain
Names in Applications (IDNA): Protocol". IETF IDNAbis WG. http://tools.ietf.org/html/draft-ietf-idnabis-protocol.
- ^
Internet Corporation For
Assigned Names and Numbers (ICANN) (October 30, 2009). "ICANN Bringing the Languages
of the World to the Global Internet". Press release. http://www.icann.org/en/announcements/announcement-30oct09-en.htm. Retrieved
2009-10-30.
- ^ a
b
c
d
"Internet addresses set for
change". BBC News.
October 30, 2009. http://news.bbc.co.uk/2/hi/technology/8333194.stm. Retrieved
2009-10-30.
- ^
Mohan, Ram, German IDN, German Language Table,
March 2003
- ^
Dam, Mohan, Karp, Kane & Hotta, IDN Guidelines 1.0, ICANN,
June 2003
- ^
Karp, Mohan, Dam, Kane, Hotta, El Bashir, IDN Guidelines 2.0, ICANN,
November 2005
- ^
Jesdanun, Anick (Associated Press) (2
November 2007). "Group on Non-English Domains
Formed". http://abcnews.go.com/Technology/wireStory?id=3813334. Retrieved 2 November
2007.
- ^
What's New in Internet
Explorer 7
- ^
International Domain Name
Support in Internet Explorer 7
- ^
Handling Internationalized
Domain Names (IDNs)
- ^ RFC 3492, Punycode: A
Bootstring encoding of Unicode for Internationalized Domain Names
in Applications (IDNA), A. Costello, The Internet Society
(March 2003)
- ^
IANA e-mails explaining the
final choice of ACE prefix
- ^
"Proposed Final
Implementation Plan for IDN ccTLD Fast Track Process" (PDF).
ICANN. September 30, 2009. http://ccnso.icann.org/workinggroups/idnc-wg-board-proposal-25jun08.pdf. Retrieved
2009-10-30.
- ^
"IDN ccTLD Fast Track
Process". ICANN. http://www.icann.org/en/topics/idn/fast-track/.
- ^
Mohan, Ram, GNSO IDN Working Group,
Outcomes Report (PDF), ICANN
- ^
On Its Way: One of the
Biggest Changes to the Internet
- ^
My Name, My Language, My
Internet: IDN Test Goes Live
- ^
Successful Evaluations of
.test IDN TLDs
- ^
IDNAbis overview (2008)
- ^
ICANN - Paris/IDN CCTLD
discussion - Wiki
- ^
ICANN Seeks Interest in IDN
ccTLD Fast-Track Process
- ^
Proposed Final Implementation
Plan: IDN ccTLD Fast Track Process, 30 September 2009
- ^
Regulator approves
multi-lingual web addresses, Silicon Republic, 30.10.2009
- ^
NeuStar IDN details
- ^
EC adopts IDN amendments to
.eu regulation. News archive of The European Registry of
Internet Domain Names. June 26, 2009.
- ^
'.eu' internet domain to be
available also in Cyrillic and Greek alphabets. Europa.eu Press
Release. June 26, 2009.
- ^
Evgeniy Gabrilovich and Alex Gontmakher, The Homograph Attack,
Communications of the ACM, 45(2):128, February 2002
- ^
IDN hacking disclosure by shmoo.com
- ^
Mozilla IDN policy
- ^
"About Safari International
Domain Name support". http://docs.info.apple.com/article.html?artnum=301116. Retrieved
2008-08-07.
External
links