Sun Java Solaris Communities My SDN Account
 
Article

Creating Multilingual Web Sites with Apache

 

If you have a business website, you may need to rethink your customer profile. Browser hits aren't just by English speakers anymore. This article describes how to provide a localized, multilingual website with the Apache 1.3 server.

Click. Click. Who's There?

English speakers aren't the only people using the world wide web. The internet is a global resource. People throughout the world surf its pages, and they're getting more demanding. It happened with commercial software, now it's happening with web pages. Potential customers want to see web content in their own language.

Web browsers and servers can be configured to negotiate content. One aspect of this negotiation is preferred language. You could ignore this and set up a greatrugs.com, greatrugs.co.jp, and greatrugs.co.ru to handle English, Japanese, and Russian separately. However, that also means you have to install and maintain three separate websites. Why not use content negotiation to set up one site that handles all languages?

Content negotiation requires that both the web browser and server participate. This involves setting user preferences on the browser and configuring the server to handle content language requests.

Setting Up the Browser

Web browsers communicate their preferred language through a HTTP request header. As a user, all you need to do is set the preference. As a web site administrator, you must either depend on the user to set this appropriately or you have to give him a choice using a menu or other mechanism. For the purpose of this article, assume that the browser's preferences are set correctly.

The steps to set up the browser language preferences are simple and vary only slightly on the two major browsers, Internet Explorer 5.5 and Netscape Navigator 6.0. The following instructions assume you want Japanese content on Navigator 6.0:

  1. Choose Edit_Preferences from the menu. The Preferences dialog box appears. See Figure 1.
  2. Select Navigator_Languages from the dialog box. A list of preferred languages is displayed. See Figure 1.
  3. Add languages and set the order of preference using the buttons on the right-hand side.
Figure 1 Setting the Preferred Languages in the Preferences Dialog Box


 

Figure 2 Adding Preferred Languages in the Preferences Dialog Box

After configuring these preferences, the browser will alert the server to provide content in the preferred languages in the order prescribed. In this example, Japanese is the preferred content. If that is not available, English is the next option. If that is not available, then Spanish.

These settings cause the browser to create HTTP requests that include your language preferences. For example, the request header for the settings above might look like this:

Accept-Language: ja, en, es [1]

Configuring Apache

The Apache web server runs on many operating systems, including Solaris, Linux, and Microsoft Windows 2000. The Microsoft Windows distribution even includes an installer. The other installations are just as easy since they typically only involve unravelling a "tarball", which is an archived file of the complete distribution. Regardless of which version you install, you should make sure that the mod-negotiation module is compiled into the binary. It is included during compilation by default, but it is better to confirm this. This module is responsible for interpreting and acting on the Accept-Language request (among other things) from a client browser.

Once you have confirmed that the binary has the negotiation module, you must choose between two ways to deliver content in various languages:

  • type maps
  • MultiViews


A type map file lists all the variants of a particular URI or URL. If you have Japanese, English, and Spanish versions of foo.html, you must list them all in the map file to make them available to a client. Here is an example of the type map file that contains variants for one URI available in three languages:

URI: foo
 
 URI: foo.en
 Content-type: text/html
 Content-language: en
 
 URI: foo.ja
 Content-type: text/html;charset=iso-2022-jp
 Content-language: ja
 
 URI: foo.es
 Content-type: text/html
 Content-language: es
 
 
When your browser asks for the resource foo, the type map is searched to find the appropriate variant based on the client's language preference.  If you want the English version of foo, the server provides foo.en.html; if you want the Japanese version, it provides foo.ja.html.

Although type map files certainly work well and provide numerous options, creating and maintaining a type map file for all resources on your web server can be a significant challenge. Unless you have an important reason to use this method, it may be too much work. Instead, you may want to enable the MultiViews option.

MultiViews is an option that can be set in Apache's httpd.conf file. The option is not available by default, so you must explicitly ask for it. The option is also a directory option, so it affects directories that you specify. The following lines show how to set this option for your root html document directory:

# This should be changed to whatever you set DocumentRoot to.
 #
 <Directory"D:/bin/ApacheGroup/Apache/htdocs">
 
 # Note that "MultiViews" must be named *explicitly* --- "Options All"
 # doesn't give it to you.
 #
 
 Options MultiViews
 </Directory>
 
 
MultiViews is a quick and easy way to enable language variants of HTML or other resource files. When a client requests the file foo.html, a MultiViews-enabled server will search for this file; if it doesn't exist, the server will search for foo.html.*. MultiViews basically creates type map files automatically, based on the result set of its search. In this example, you will provide the language variants for foo.html: foo.html.en, foo.html.ja, and foo.html.es. The server's search for foo.html.* finds all of these and provides the file that best meets the user's needs.

In order for the MultiViews option to recognise what extension is appropriate for a specific language, you should include the AddLanguage directive in the httpd.conf file. This directive can associate extensions like .es, .ja, and .en to the ISO 639 language codes es, ja, and en. Of course, you can use any extensions, but it is advised to use the standard two-letter ISO language codes. Following this standard, the foo.html.es resource is localized for es, which is the ISO code for Spanish.

The following lines add language extension support for Japanese, Spanish, and English:

AddLanguage es .es
 AddLanguage ja .ja
 AddLanguage en .en
 
 
Additionally, if you want to add support for preferred character set encodings, you can use the AddCharset directive. This directive tells the server that you can provide a localized file in various character sets, and instructs the server to consider this when negotiating content.
AddCharset Big5 .big5
 AddCharset ISO-8859-1 .iso859-1
 AddCharset WINDOWS-1251 .cp1251
 AddCharset ISO-2022-JP .jis
 AddCharset UTF-8 .utf8
 
 
If the user's browser preferences also indicate a preferred charset, this can also be accommodated. Using this configuration, you can provide a Japanese localized foo.html in two different encodings:
  • foo.html.ja.jis
  • foo.html.ja.utf8
Finally, one more directive must be added. The server must be instructed as to which language to provide in the case where the user doesn't have a preference or there is somehow a tie in content negotiation. The LanguagePriority directive does this in the httpd.conf configuration file:
# LanguagePriority allows you to give precedence to some languages
 # in case of a tie during content negotiation.
 # Just list the languages in decreasing order of preference.
 #
 <IfModule mod_negotiation.c>
     LanguagePriority ja en es
 </IfModule>
 
 

Using What We Know

After configuring Apache with MultiViews on the root document directory, it must be
tested. Create a foo.html.en file in the root directory, d:/bin/ApacheGroup/Apache/htdocs/. The file foo.html.en looks like this:
<html>
   <head>
     <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
     <meta name="author" content="John O&apos;Conner">
     <meta name="description" content="English file">
     <title>Hello</title>
   </head>
   <body>
     <font size="+3"> Hello, world!</font><br>
   </body>
 </html>
 
 
Assuming that the server is running and its name is sharkfin, you can access the file in the browser as shown in Figure 3.

Figure 3 The English Version of the File is Shown

Remember that the browser has already been configured with the language preferences: ja, en, es. So far, the server only has an English version of the foo.html file, so it provides what it can, which is the foo.html.en version. However, once it is localized to Japanese, a different result is expected. Here is the localized version in Japanese, foo.html.ja:

<html>
   <head>
     <meta http-equiv="content-type" content="text/html; charset=Shift_JIS">
     <meta name="author" content="John O&apos;Conner">
     <meta name="description" content="Japanese file">
     <title>Hello in Japanese</title>
   </head>
   <body>
     <font size="+3">今日は、世界!/font><br>
   </body>
 </html>
Note: You must have the Shift_JIS charset installed in order to view the Japanese characters in the above code.

Now when you display the same URL, foo.html, the Japanese version is displayed, since it is the first language preference. Figure 4 shows the results of browsing this page once the foo.html.ja file has been added.

Figure 4 The Japanese Version of the File is Shown

At this point, the root document directory contains these files:

  • foo.html.en
  • foo.html.ja
It is important to note that you do NOT provide foo.html without a language extension. Doing so would defeat the MultiViews algorithm, which works only when the requested URL does not exist. In other words, when foo.html doesn't exist, it works to negotiate the best match, which is how localized resources are provided by preference.

Summary

You can provide multilingual, localized pages from a single Apache web server by enabling MultiViews support. With this feature enabled, the server will negotiate the localized content with the client browser and provide the preferred localized pages if they exist. You can localize any or all of your web pages with this feature. Setting up a single domain name that provides multilingual content is easier to maintain than multiple domains with their own localizations. Also, a single domain name is easier for customers to remember.



[1] The "quality factor", q, is intentionally omitted to simplify the discussion. See the HTTP 1.0 specification for more details on request headers, http://www.w3.org/Protocols/rfc2068/rfc2068.

© 2001 John O'Conner. John O'Conner is a staff engineer specializing in Java internationalization.
Related Links
 

Oracle is reviewing the Sun product roadmap and will provide guidance to customers in accordance with Oracle's standard product communication policies. Any resulting features and timing of release of such features as determined by Oracle's review of roadmaps, are at the sole discretion of Oracle. All product roadmap information, whether communicated by Sun Microsystems or by Oracle, does not represent a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. It is intended for information purposes only, and may not be incorporated into any contract.