Why is an "ß"-domain (such as "straße.de") not always resolved properly?

If a domain with the letter "ß" in its name is resolved correctly, for instance by a web browser, depends on the fact if the web browser that is used supports the new IDN standard that has come into effect in August 2010.

As long as there are some vendors who have not yet updated their browsers and/or users who apply older browser versions which still automatically convert the "ß" into "ss", failures may occur. Such browsers will resolve the entry "straße.de" to the domain "strasse.de".

How can I register a domain containing the character "ß"?

The registration procedure for ß-domains is principally identical with the procedure to be applied for any other domain. Parties intending to register an ß-domain must contact a provider, who will then handle the registration for them. They can choose any provider they like.

As regards the procedure itself, the two registration phases (sunrise period and standard operations) principally follow the same pattern. The only special rule for the sunrise period is that exclusively persons who were registered as holders of the corresponding ss-domain at the decisive point in time, i.e. on 26 October 2010, 15:00 CEST are entitled during this period to register the ß-domain corresponding to the ss-domain.

Do IDNs represent a security risk?

Not only domains that consist exclusively of ASCII characters can be (ab)used to attract users to forged websites. This is also true for Internationalized Domains (IDNs). At first glance the domains of such websites look like known original domains. But they were registered by third parties exclusively to imitate the original. The goal of such attempted fraud - also called "phishing" - is to spy confidential information such as passwords.

The risk to become a phishing victim is not more severe for IDNs than it is for domains whose names include only ASCII characters. A typical method to fool the user is to replace the original "o" by "0", "1" by "l" or a lower-case "l" by an upper-case "i".

The developers of the IDN standard were well aware of the existence of identical glyphs (character displays) in different scripts (Latin, Cyrillic, Greek etc.) when they created the specification, and the IDN RFCs 3490 and 3491 expressly make reference to them. To facilitate the unique identification of characters and to impede their replacement, DENIC allows exclusively Latin characters to be used. Thus, no sets of characters with identical appearance will be accepted for .de domains, but at best similar ones.

You can take quite a few measures to protect against phishing attempts. Below you will find a list - not claiming to be exhaustive - of the most important ones.

  • Always use encrypted connections to pass on sensitive information. Reveal it only to identified, trustworthy partners.
  • Be suspicious if e-mails, blogs etc. request you to "urgently" visit pages you do business with (e-banking etc.).
  • Rather follow established bookmarks instead of links in e-mails. Avoid using HTML-coded mails.
  • Take warning messages about insecure, unknown or altered certificates serious.
  • Consult the Federal Agency for Security in Information Technology (BSI) at regular intervals so that you are always up-to-date.

As regards .de domains, IDNs present a useful enrichment and by incorporating language-specific characters they foster .de's function as a country code Top Level Domain. The advantages clearly outnumber potential security risks, which would persist even if without IDNs.

What does IDN mean?

IDN stands for Internationalized Domain Name. This is a standard for domains that can contain characters other than the 36 basic ASCII ones. These include Latin letters with umlauts (ä, ö, ü) and diacritics such as accents (é, à …), the cedilla (ç) or the tilde (ñ) and háček (č).

Up to the year 2003, the characters permitted to be used for domains were limited to certain ASCII characters (a-z, 0-9, -), since only this character set was supported by the standard Internet protocols. Particularly those countries which use other alphabets than the Latin one made considerable efforts to achieve that the permitted character set was extended by characters from other alphabets.

In the year 2003, the Internet Engineering Task Force (IETF) published the so-called IDNA standard by issuing the RFC 3490 ("Internationalizing Domain Names in Applications“) and subsequent RFCs. This standard permits the use of specific umlauts and diacritics. In August 2010, it was updated (cf. RFC 5890 to 5894).

The standard assumes that any non-ASCII character used in a domain name is transcoded into an ASCII character string in accordance with the standard's provisions. For background information and the technical implementation refer to the special FAQ.

Could you tell me a bit about the technical functioning of IDNs?

To make IDNs backwards compatible to standard Internet protocols, IDNA-compatible applications must work in the background to ensure that IDNs are converted into ASCII character strings before they are used in the Domain Name System (DNS).
To this end, the native form or the U-label (U = Unicode) of an IDN is converted into an ASCII character string by means of the so-called Punycode algorithm. The result is called ACE (=ASCII Compatible Encoding) string or A-label (A = ASCII). This string starts with the ACE prefix "xn--" to indicate that the following domain is an IDN. The string must also indicate which non-ASCII characters are included and what their position is in the U-label. This encoding is executed by means of the so-called Punycode algorithm. Thus, the ASCII string (A-label) of an IDN is composed of the ACE prefix followed by all classical ASCII components of the domain name (which need no transcoding) and the code presenting the non-ASCII characters.
On its website, DENIC makes available a conversion tool with which you can determine the ACE string of any IDN.

Conventional domains are case-insensitive. What about IDN domains?

The DNS makes no difference between lower-case and upper-case ASCII characters (cf. RFC 4343). However, the IDNA standard does not treat lower-case and upper-case non-ASCII characters as generally equivalent. Moreover, U-labels must be written in Unicode characters in the normalization form C, i.e. no upper-case letters must be used. But the standard provides for optional "normalization" after data has been entered by a user. This process takes account of the specific local settings of the user, such as regional or language options, and maps, for example, upper-case letters to lower-case letters. You will find a description of this normalization process in the RFC 5895. The Unicode Consortium considers the Unicode Technical Standard #46 (http://unicode.org/report/tr46/) an alternative option.

Can all applications, such as browsers and e-mail programs, handle IDNs?

The A-label (i.e. the ASCII-character-encoded form of a domain) usually remains invisible to the user because it is needed only for technical operations. Whenever users enter domains in form of non-ASCII characters, the applications used, such as web browsers or e-mail programs, must "translate" the data.

Most applications support the characters which have been permitted since the Internet standard 2003 became effective. The situation is different for "ß", which can only be used since the IDN standard was revised in August 2010. Up to that date, "ß" was always converted into "ss" (normalized). Since there are many browsers and e-mail programs that will not support the "ß" during a transition period but will continue to convert it into "ss", users may receive unexpected results when querying an ß-domain. For instance, if you enter the domain "straße.de" in your browser, you will either be properly directed to the contents recorded under "straße.de" (if the browser already applies software in accordance with the new standard) or, wrongly, to the contents stored under "strasse.de" (if the browser still uses software in accordance with the old standard).

Even if you have an IDN-enabled browser and e-mail client, you have no guarantee that all other Internet users will be able to call your IDN-based website or to send you an e-mail to your IDN address. There is, however, an emergency solution for such situations: You may use the IDN's ACE string. In our example, this means replacing info@straße.de with the transcoded form of info@xn--zz-strae-oga.de in your entry.

What precisely is the object of the contract for IDN domains?

The object of the contract concluded between DENIC and the domain holder is the U-label of the IDN domain. Thus, you must state the domain with its native (not Punycode-encoded) character string, converted to lower-case, in all official documents and forms.

Example: straße.de (NOT xn--strae-oqa.de)

How can I query an ß-domain?

Like all other domains, you can query ß-domains via DENIC's regular information services.

You can use the domain query service (web-whois) on DENIC's website or the command-line-based public-whois for this purpose.

How does DENIC's whois deal with IDNs?

DENIC's whois service on port 43 is fully compliant with RFC3912. In order to provide support for our internationalized database (for instance, with contact data or IDNs), characters not belonging to the ASCII character set are delivered in UTF-8 encoding by default. The main reason for choosing UTF-8 is its backwards compatibility to ASCII. Further, it is the preferred encoding for IETF protocols according to RFC2277. Support for Unicode, UTF-8 and its other transformation formats is widespread in all modern software and operating systems.

It is true that the whois protocol does not principally support internationalization. DENIC implemented some proprietary supplements of this protocol. They enable the existing whois clients to define the character set which is to be used for queries and replies (besides US-ASCII you may use the ISO-8859-1 encoding - also known as Latin 1 - or UTF-8, which is very popular especially in Europe). However, you are not obliged to use these extensions.

How can I find out about IDN domains in DENIC's whois?

You may query IDNs via the domain query service on DENIC's public website. However, if you use this site, you cannot query the A-label.

For other query options, please use the command-line-based public-whois or the RRI commands CHECK and INFO.

Which characters are permitted in .de domains?

Besides the ASCII characters (the 26 Latin letters, the ten numerals and the hyphen) you may use some other characters for IDNs under .de. These include the German umlauts ä, ö and ü as well as the letter eszett ("ß") and letters with accents and other diacritics. We have compiled a list with the new additional characters valid for IDNs under .de in a table for you.

You might wonder why these particular 93 characters have been chosen and not others. There are several reasons:

  • DENIC supports all characters included in the Unicode Latin-1 Supplement and Latin Extended-A blocks which are marked as "PROTOCOL VALID" in the RFC 5892 (The Unicode Code Points and Internationalized Domain Names for Applications).
  • DENIC is an open registry free from any form of discrimination. In Germany, there is no meaningful way of drawing a line between the various character sets, since the written German language now includes characters that originated in the languages of the northern, southern and eastern parts of Europe. The sensible and appropriate solution for us therefore seemed to be to adopt two blocks that cover the necessary European character set for those languages that are based on the Latin alphabet, including some additional characters.
  • The most frequently used new characters of these character sets can be entered via standard German keyboards without requiring any additional equipment or outlay.

What are the rules for the minimum and maximum lengths of character strings in IDN domains?

A .de domain must consist of at least one character, but its maximum length must not exceed 63 characters. In the case of IDNs, the question immediately arises as to whether these length constraints refer to the IDN itself (such as straße.de) or its transposition as an ASCII character string (xn--strae-oqa.de).

For technical reasons, the maximum length of 63 characters applies to the ASCII character string, whereas the minimum length of three characters applies to the IDN.

Are IDNs permitted as host names for name servers and NSentries?

No. These entries are loaded directly into the name server and are not transposed first. The situation for host names for name servers and NS entries remains precisely what it has been so far: the only permitted designations are those that are comprised solely of the basic ASCII character set. It is thus possible to enter the punycode value (such as dns.xn--strae-oqa.de) as the host name in the name-server entries but not the corresponding IDN (such as dns.straße.de).

This arrangement has one big advantage in that hosts with Japanese, Chinese, Cyrillic, etc. names can also act as name servers too and there are no limitations to a particular character set. It is true that such host names cannot be registered under .de, but other registries are free to choose what letters and other characters they want to permit and, of course, their decisions are guided by the needs of the internet users they cater for.

What is Unicode?

Computers can only work with numbers. So letters of the alphabet and other characters have to be assigned to numbers before computers can process and store them. Before Unicode was developed there used to be hundreds of different coding systems, and not one of them was complete. Even just concentrating on one language (such as German) there was not a single system that really contained all the letters of the alphabet, punctuation marks and technical symbols in common use. The situation was rendered even more unsatisfactory in that it was not possible to use these various coding system side-by-side at the same time, since the various numbers were assigned to different characters. All this changed with the advent of Unicode, which now ensures unique assignments of characters to numbers, no matter what hardware and software is used. Texts that use Unicode can be exchanged throughout the world without problems or loss of information.

The original definition of Unicode and its further development is in the hands of the Unicode-Consortium, a non-profit body, whose purpose is to normalize and standardize the representation of text data in the computer field. The consortium's members include many companies and institutions from the IT sector.

What does "punycode" mean?

Punycode is a rule that describes how Unicode characters are assigned uniquely to ASCII character strings. You will find a technical definition of this rule in RFC3492 (Punycode: A Bootstring Encoding of Unicode for Internationalized Domain Names in Applications).

In an extremely simplified form, the following is what happens in this transposition:

The previously normalized IDN has the prefix "xn--" placed in front of it. All non-ASCII characters are taken out. The punycode algorithm determines what these characters were and where they stood and adds this coded information to the end of the string that is left. To give an example: "zääz.de" is encoded as "xn--zz-viaa.de".