Sun Java Solaris Communities My SDN Account
 
FAQ

Web Globalization

 
 
Multilingual Web Addresses, International Domain Names [IDN].

Multilingual web addresses are possible. However, there are many variables.
Crucially IDN is not supported natively in Microsoft Internet Explorer. Quite astonishing actually. You need to download a third party plugin, after which it works well.

There's a very readable article on the W3C site that deals with this field.

Back to Top


Servers

Are there internationalization issues associated with servers?

This depends on the version of HTTP that the server supports. For example, HTTP 1.0 RFC 1945 does not support internationalization

HTTP 1.1 RFC 2616 introduced several methods for language negotiation.
Two of these protocols, Accept-Language and Content-Language, allow a client to inform the server of its preferred languages. For information on the internationalization features of HTTP/1.1 see here .

Back to Top


Does Accept-Language work on the Internet?

Yes it does, and its use is becoming more widespread. For example:

To properly test the above sites, you may have to change your browser language preferences.

Back to Top


How do I configure the "Language Preference" in my browser?

It is very simple. However, though this will enable you to tell the server what language you prefer, most web sites only provide content in one language.

  • Mozilla: Edit => Preferences => Navigator => Languages
  • Firefox: Tools => General Options => Languages
  • Internet Explorer: Tools => Internet Options => General => Languages
  • Opera: File => Preferences => Languages

Back to Top


How does Accept-Language work between a browser and a server?

The browser simply sends a preferred language list to the server as part of the HTTP request.
If the server is properly configured, then it parses the "Accept-Language" header [which can be a list of languages in decreasing preference]. It then looks for the resource in the first preferred language. If the resource is not available in the preferred language, then the server looks at the next second preferred language in the list, and so on. This process is called content negotiation. If the resource is unavailable in any of the preferred languages, then the configured default fallback language is served.

sample telnet session with a web server

The Accept-Language field in the HTTP request header indicates that the client would like Japanese, but will accept English.

If no Accept-Language header is present in the request, the server should assume that all languages are equally acceptable, and will typically fallback to a preconfigured default language.

Back to Top


What support does Apache provide for serving multilingual content [content negotiation]?

The article Creating A Multilingual Web Site With Apache describes how to do it with Apache 1.3.
Documentation is also available on versions 2.0 and 2.2.
Note that content negotiation can take more than simply language preference into account. Other negotiable factors include encoding, charcter set and media type. However I have only ever seen content negotiation based on language preference. Also, in many cases, content negotiation is handled at the application level rather than the server level.


What support does the Sun Java System Web Server (formerly Sun ONE Web Server)  provide for serving multilingual content?

The article Hosting Multilingual Websites with Sun ONE 6 Web Server  deals with Web Server v6.0, while the documentation on docs.sun.com handles v6.1

 

Back to Top


HTML Documentation

What elements of an HTML document are localizable?
Tag
Type Format Applicable to PDF Files Localizable
Title HTML <title>Place your title text here</title> YES YES
Description Meta <meta name="Description" content="Page summary goes here." > YES YES
Date Meta <meta name="date" content="YYYY-MM-DD" > YES NO
Keyword Meta <meta name="keywords" content="word 1, word 2, word n"> YES YES
Doc Type HTML <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">   NO
HTML Lang HTML <html lang="ll-CC">
or
<html lang="ll">
  YES
Content-Type HTML <meta http-equiv="Content-Type" content="text/html; charset=xxx-x">   YES
Content-Language HTML <meta http-equiv="content-language" content="ll-CC">   YES
Anchor Text (for HREF links) HTML "<a class="linkgrey" href="http://www.mozilla.org/js/">What is JavaScript?</a>"   YES
Alt Tag for Images HTML <img src="/en/img/nav/xxxx.gif" alt="Add image description text here." border="0">   YES

DATE Tag

The DATE tag can reflect either the creation date or revision date.

 The DATE tag should not usually be localized.  Enterprise search engines that may use this field will expect a definite format.
The following format is recommended.

<meta name="date" content="YYYY-MM-DD">

HTML LANG (Language) Tag

The language attribute is used by the browser to choose the correct font for a document.

Format

The <html> tag should declare the content language (or natural language) of the document. For example,

<html lang="ll-CC">

where "ll" is the language code, and "CC" is the country code.

For example:

<html lang="en-US">

or

<html lang="ja-JP">

The shorter "ll" option uses the language code only: <html lang="en">. This is acceptable, though it less accurately describes the actual language of the page.

For a list a language options, see: http://www.i18nguy.com/unicode/language-identifiers.html.


CONTENT-TYPE Tag

The CONTENT-TYPE declaration tells the client application what type of content is being served. In addition, it specifies the character encoding of the content. It should be the first thing declared in the <head> tag.

If the Content-Type tag is not placed before the Title tag, there is the possibility that titles that contain non-ASCII characters will not be interpreted correctly.

The Content-Type attribute should be on every HTML page.

It is declared within the head <head> tag using this format:

<meta http-equiv="Content-Type" content="text/html; charset=xxxxx">

Set the charset attribute to the correct encoding of the document. This could be UTF-8, ISO-8859-1 or many others. For example,

  • charset=UTF-8
  • charset=ISO-8859-1

Note 1: Changing the charset attribute on its own does not change the character encoding of the document.
Note 2: UTF-8 is the preferred format for page generation.
Note 3: The UTF-8 encoding can accomodate all languages.


CONTENT-LANGUAGE Tag

The content language of the document can be declared a second time. Sometimes enterprise search engines may not read [though they should] the first declaration in the <html> tag. The second declaration is as part of the meta information within the <head> tag.

<meta http-equiv="content-language" content="ll-CC">

Where "ll" is the language code and "CC" is the country code. For example, the correct content language options for French Canadian is: content="fr-CA">

For a list a language options, see: http://www.i18nguy.com/unicode/language-identifiers.html.

Back to top


ALT Tag for Images

ALT tags allow text to be associated with images. ALT tags are used by screen readers and other tools that enable accessibility for disabled users. In addition, they display in most browsers during mouseover.

ALT tags provide compliance of Section 508 Federal guidelines for accessibility. The ALT tag must be written in the same language as the document, that is, it must be localized.

Back to top


Example Of Fully Compliant HTML Page

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en-US">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<meta http-equiv="content-language" content="en-US">
<title>Java Java Java ...</title>
<meta name="description" content="Java technology is a portfolio of products. Learn more about the powerful features and advantages of Java technology from Sun Microsystems.">
<meta name="keywords" content="word 1, word 2, word 3">
<meta name="date" content="2004-11-23">
</head>
<body .......> //etc.


Oracle is reviewing the Sun product roadmap and will provide guidance to customers in accordance with Oracle's standard product communication policies. Any resulting features and timing of release of such features as determined by Oracle's review of roadmaps, are at the sole discretion of Oracle. All product roadmap information, whether communicated by Sun Microsystems or by Oracle, does not represent a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. It is intended for information purposes only, and may not be incorporated into any contract.

The above deals with some of the more obvious parts of HTML localization.  More subtle considerations can include:

  • using images appropriate to a particular locale
  • using the right  colors; different colors have different meanings in different cultures.

Can I specify the character encoding of a document?

Yes you can. In the <HEAD> tag insert the following:

<META http-equiv="Content-Type" content="test/html;charset=UTF-8">

The Content-Type header should always be provided in either the HTTP or, as above, the HTML header. Preferably both. It can be the difference between a properly displayed page and garbled nonsense.  However, if the character encoding declared in the HTTP and HTML headers differs, then the HTTP header will win.
The HTML Content-Type header is especially useful if the file is saved to a user's local disk and later re-viewed. At that point there obviously won't be any HTTP header to tell the Content-Type to the browser.


More.


How can I display Japanese, Chinese and Korean characters on a web page

There are two ways you can do this.

  1. Use Numeric Character References [NCRs]. NCRs allow the representation of any Unicode character using only ASCII characters. For example:
    &#12479;&#12452;&#48156;&#51088;&#49324;&#31867;&#21035;
    contains some random Japanese, Korean and Chinese characters which actually represents the string:

    タイ발자사类别

    A further discussion is available here.

  2. The other, and indeed more appropriate way, is to use UTF-8 as your document's character encoding [as used for this page].
    UTF-8 can comfortably handle all languages in use today and modern browsers have good support for it.
    See How to Create UTF-8 Encoded HTML Pages .

Back to Top


CGI
  • How do I manage charset conversions from the browser to a database and back?
    See this technical tip.

Java

Where do I go for Java Internationalization Information?

Back to Top



Standard Activities

Which organizations are working on WWW internationalization standards?

Back to Top

Related Links
 

Oracle is reviewing the Sun product roadmap and will provide guidance to customers in accordance with Oracle's standard product communication policies. Any resulting features and timing of release of such features as determined by Oracle's review of roadmaps, are at the sole discretion of Oracle. All product roadmap information, whether communicated by Sun Microsystems or by Oracle, does not represent a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. It is intended for information purposes only, and may not be incorporated into any contract.