Sun Java Solaris Communities My SDN Account Join SDN
 
Article

Hosting Multilingual Web Sites with Sun's Web Server

 

Note: Sun ONE Web Server is now marketed as the Sun Java System Web Server [SJSWS]

1. Introduction

The Internet addresses a global market, and Web site design should take into consideration the requirements of users from different countries. In an international setting, Web servers need to accommodate data exchange in a wide range of character sets. HTTP was initially meant for exchange of static documents, and internationalization issues were not addressed properly. For instance, when a client submits a form to a Web server, there is no mechanism to specify the encoding of the request. Sun solves this problem and many others by providing a framework that allows dynamic content exchange in different languages and character encodings as requested by clients.

2. Static documents

    Sun Java System Web Server can serve localized static documents according to the client preferred language. This is achieved through content negotiation between the client and the Web Server. When a client sends a request to the server using HTTP, it includes the Accept-Language header describing the various languages it accepts. You can configure  our web server to parse this language information in order to dynamically load localized pages.
How to enable Accept-Language parsing in Web Server v6.0
Manual setting: Add acceptlanguage="on" to the VSCLASS element in the server.xml file of the Web Server instance.

Through Admin Console: Log into Admin Console and click on the Manage button on the Servers tab.  Click on the Virtual Server Class tab, then click on the Edit Classes link. Change the Accept-Language value to on and submit.

How to set the browser language
Netscape:
  1. Choose Preferences from the Edit menu.
  2. Choose Languages under the Navigator heading on the dialog box. A list of preferred languages is displayed.
  3. Add languages and set the order of preference using the buttons on the right hand side.
Internet Explorer:
  1. Choose Internet Options from the Tools menu.
  2. Click on the Languages... button on the General tab.
  3. Add languages and set the order of preference using the buttons on the right hand side.
Example:
When acceptlanguage is set to on, suppose for instance that a client sends the Accept-Language header with the values fr-CH, de, when requesting the following URL:
http://www.someplace.com/somepage.html
The Web Server searches for the file in the following order:
1. By language code and country code
http://www.someplace.com/fr_ch/somepage.html
http://www.someplace.com/somepage_fr_ch.html
http://www.someplace.com/de/somepage.html
http://www.someplace.com/somepage_de.html
2. By language code only
http://www.someplace.com/fr/somepage.html
http://www.someplace.com/somepage_fr.html
3. Using the DefaultLanguage value that is defined in the magnus.conf file. For instance, if en is set to be the default, the lookup will continue as follows:
http://www.someplace.com/en/somepage.html
http://www.someplace.com/somepage_en.html
4. If none of these are found, the server tries:
http://www.someplace.com/somepage.html
 
Note :
When naming your localized files, country codes like CH and TW are converted to lower case and dashes ( - ) are converted to underscores ( _ ).
 
Using Other Language Settings

The following directives in the magnus.conf file specify language defaults:

 
Directive  Values  Description 
ClientLanguage en, fr, de, ja Specifies the language in which client messages, such as "Not Found" or "Access denied" are to be expressed. This value is used to determine which ns-httpd.db database to use for the localized messages. 
DefaultLanguage en, fr, de, ja Specifies the language used if a resource cannot be found for the client language. 
     

3. Servlet Internationalization

3.1 Request Character Encoding
When form data is submitted from a browser to the server using the POST method, the browser url-encodes the POST data and sets the Content-Type header to application/x-www-form-urlencoded, but does not send any charset information.
On the server side, if a servlet tries to access POST data using getParameter or getParameterValues, the servlet container does not have any information about which character encoding to use for getParameter strings. You can configure Sun Java System Web Server 6.0 to instruct the servlet container which character encoding to use for interpreting POST data strings. To do this, specify the character encoding using the parameter-encoding element in web-apps.xml:
<parameter-encoding enc="value1" form-hint-field="value2"/>
 
enc Allowed values are auto (the default), none, or a specific encoding such as UTF8 or Shift_JIS
any supported Java character encoding  A specific encoding, such as UTF8 or Shift_JIS. Set this option if you know the encoding that servlet parameters use. A complete list is available here: 

http://java.sun.com/j2se/1.3/docs/guide/intl/encoding.doc.html

none Uses the system default encoding. Set this option if the encoding of the servlet parameter data is the same as the system default encoding. 
auto(default) Tries to figure out the proper encoding from, in order:
1) the charset specified in the Content-Type header,
2) the parameterEncoding attribute (see ServletRequest.setAttribute),
3) a hidden form field defined in form-hint-field. Otherwise, the system default encoding is used. Set this option to prevent misinterpretation of non-ASCII characters in servlet parameters. 
form-hint-field The name of the hidden field in the form that specifies the encoding. The default is j_encoding
Which option you choose from the above list depends on your application. If you design your application for only one specific language, for instance Japanese using the Shift_JIS encoding, you can specify the value: <parameter-encoding enc="Shift_JIS">. If you want your application to be multilingual, you can choose UTF-8 since it covers most languages: <parameter-encoding enc="UTF8">. However, some types of clients prefer a more locale-specific encoding; in these cases UTF-8 is not the best choice. The auto choice in conjunction with the hidden field is a more flexible solution but requires more effort. Each time you send a request to the Web Server, you need to specify the encoding of that request so that the Web Container does the correct conversion when you call the getParameter function. If the hidden field is set correctly, the Web Container will automatically do the correct conversion.

Examples using the hidden field:

- The users who access your application are registered users. In this scenario each user has a profile which contains preferences for language and charset. The language is used for localized documents and messages. The charset value is used for data conversion when receiving requests and sending responses. After the user is authenticated, the charset value is loaded from the user profile. Every form that is sent to the user includes the hidden field.

- In this next scenario anyone can access the Web site. No profile data for the users is saved on the server. When a user accesses the application for the first time, the request.getLocale() method is called to find the preferred language of that client. Each language is mapped to a charset in a special mapping table. A new session object is then created to store the language and charset values. Each time a form is sent to the user, a hidden field is included. The value of the hidden field is obtained from the charset value stored in the session object. The user can also be offered the option of dynamically switching between languages within the same session. For example, a banner can always be included in the pages that are sent to the client. A servlet that modifies the values of language and charset can be called when the user clicks on a particular language in the banner. If a user is in an English locale and wishes to switch to a Japanese locale, they could click on a Japanese language link which invokes a servlet:
http://<host>/ChangeLocaleServlet?lang=ja&charset=Shift_JIS. The ChangeLocaleServlet will alter the session object to set the value of language to ja and charset to Shift_JIS; it will then redirect the call to a Japanese page. The user will now have a Japanese interface. Later on, when a form is sent to the client, the hidden field value will be Shift_JIS.
 

3.2 Response Character Encoding
The charset for the response can be specified with setContentType method of the ServletResponse class. For example, the call response.setContentType("text/html; charset=Shift_JIS") will inform the Web Container to convert the response byte stream using the Shift_JIS encoding. The header that is sent to the client will be set to <meta http-equiv="Content-Type" content="text/html; charset=Shift_JIS">; this will allow the client to correctly interpret the response content. If the response includes a form, the POST data will be converted using the same encoding that is specified in the header. It is very important to always call the setContentType method with the charset value in order to ensure consistent communication between the client and the server.
 
If the charset value is not specified in the setContentType call, or if setContentType is not called at all , ISO-8859-1 will be used as the default encoding.
 
The setContentType method should always be called before calling the getWriter method to get the PrintWriter object.
 
Another servlet internationalization function is ServletResponse.setLocale(java.util.Locale); when this function is called, the servlet will set the Locale information of the response. But since there is no one-to-one mapping between locale and charset, a best match will be applied to map the charset to the locale that is passed to the function call.
 
The setLocale and setContentType methods should be called before the getWriter method of the ServletResponse interface is called. This ensures that the returned PrintWriter is configured appropriately for the target Locale. A call to the setContentType method with a charset component for a particular content type will override the value set via a prior call to setLocale.

4. Posting to JSPs

You can configure parameter-encoding to work the same way when you are posting to a JSP instead of a servlet. The following example demonstrates a JSP configuration of auto to read parameters which are in the Shift_JIS encoding:
<%@ page contentType="text/html; charset=Shift_JIS" %>
<html>
<head>
<title>JSP Test Case</title>
</head>
<body>
<% request.setAttribute("com.iplanet.server.http.servlet.parameterEncoding", "Shift_JIS"); %>
<h1>The Entered Name is : <%= request.getParameter("test") %> </h1>
</body>
</html>

5. Serving all the documents using the same charset

You can override a client's default character set setting for a document, a set of documents, or a directory by selecting a resource and entering a character set for that resource. The browser uses a MIME-type charset parameter in the document header to detect the charset of the requested document.
To change the charset, go to the Admin Console of the Web Server and follow these steps:
 
  • >From the Class Manager, click the ContentMgmt tab.
  • In the left frame, click on the International Characters link.
  • Choose The Entire Server from the resource picker to apply your change to the whole class, or navigate to the document root for a specific virtual server, or to a specific directory, or within a specific virtual server.
  • Set the Character set (charset) for all or part of the server. If you leave this field blank, the charset is set to NONE.
  • Click OK.
  • 6. Conclusion

    Sun Java System Web Server 6.0 simplifies the management of international data. With some planning and evaluation of your customers, you can configure the web server to satisfy their document requirements. For more information about Sun Java System Web Server, see the product information page at:
    http://wwws.sun.com/software/products/web_srvr/home_web_srvr.html

    Related Links