Sun Java Solaris Communities My SDN Account Join SDN
 
Article

International Support in Solaris 7

 
The following article about Solaris internationalization was published in Multilingual Computing & Technology.
  • INTERNATIONAL SUPPORT IN SOLARIS 7: Multilanguage and multiscript features unite in Sun's new working environment (Multilingual Computing & Technology Volume 11 Issue 1)

INTERNATIONAL SUPPORT
IN SOLARIS 7
Multilanguage and multiscript features
unite in Sun's new working environment

The recent entry of electronic commerce has shifted the whole business paradigm of distributing goods and services, which traditionally has been through retail chains and direct mail catalogs among others. Even though electronic commerce provides an opportunity for enterprises to further enlarge their market share and presence across different regions of the world, it does not come without its own set of challenges. These enterprises now realize that they need to have distributed, server-centric applications that can handle multilingual data processing to cater to individual Web clients' needs in their native languages.

As the global economy becomes more integrated, multinational companies with headquarter offices and branch operations in different parts of the world are seeing the pressing need to have a unified system-software architecture that can support global networks without the incompatibilities often found with different localized versions of software. Most of all, they need internationalized applications that can support end-to-end computing across multilingual and multicultural barriers without modification to their core systems.

This trend towards globalization of markets and economies is driving complex requirements in the area of language support, particularly when there are so many different writing scripts being used in the world that need to be taken into consideration.

A script can be defined as a collection of related graphic symbols used for writing. Different regions and cultures have developed thousands of languages and hundreds of scripts to communicate within their own cultural settings. Most European languages use the Latin script, while Middle Eastern languages often use the Arabic script, and some Asian languages use Han ideographs.

In order for an application to be able to communicate seamlessly across multilingual and multicultural barriers, the operating environment has to be able to handle the different types of writing systems. This is where we categorize the different types of writing systems into three groups:

  • Simple Input to Simple Output
  • Complex Input to Simple Output
  • Simple Input to Complex Output
SUPPORT FOR WORLD
WRITING SYSTEMS

Simple Input to Simple Output

In English and other Latin-script languages, these langauages are represented with single byte characters, whereby one single keystroke will produce a reciprocal character on screen. The input characters are stored in the order in which they are typed, and these characters are processed for display or printing in the same sequence. There is no difference between the way the text is stored and the way it is displayed. This is known as the Simple Input to Simple Output type of writing system. All data processing systems are capable of handling this.

Complex Input to Simple Output

However, many world languages have writing systems that are fundamentally different from that of western languages. Complex transformations often need to take place between the input of words and sentences and the actual rendering of these words and sentences on the screen or the printed page. Languages such as Japanese, Korean, and Chinese are based on a set of symbols or ideographs and require multiple keystrokes before a glyph (symbol) can be displayed on the screen. Because of the complexity involved, these languages require pre-processing of text input before a simple output can be derived. This second type of writing system is known as Complex Input to Simple Output.

Simple Input to Complex Output

The third type of writing system includes Complex Text Layout (CTL) languages such as Arabic, Hebrew, and Thai. These languages have a different layout for display and printing of text from the order in which the text is stored. For these languages, the characters could be composed of several alphabetic elements, including vowels, consonants, diacritics, and tone marks as in the Thai language. Also, many of these CTL scripts are bi-directional and context dependent. Such scripts require post processing before text can be properly rendered on screen. This third type of writing system is known as Simple Input to Complex Output.

The Solaris 7 operating environment can support all three types of writing systems. It currently supports up to 37 languages and 97 locales. CTL languages are the latest support added to Solaris 7. Fully localized in 10 languages, it is available as English and European Solaris (including German, French, Spanish, Swedish, and Italian), Simplified Chinese, Traditional Chinese, Japanese Solaris and Korean Solaris. Each of these products includes its own culturally specific date/time/number formats, monetary format, associated codeset, collation (sort order), input methods, and interface information (messages, icons).

For developers who need to test different locales or to work in different locales for different projects, Solaris provides support for an additional 27 languages, making a total of 37 languages available for their use.

MULTILINGUAL COMPUTING MODEL

There are many ways to implement a multilingual computing environment. This is where we find the phrase multilingual computing taking on a different form in the real computing environment. It is important that we first understand and distinguish among the different types of so-called multilingual computing environments. There are really three types available:

  • multilanguage
  • multiscript
  • multilingual
Multilanguage Environment

In a multilanguage environment, an application inherits all the language and cultural attributes of the current locale with text manipulated according to the language rules of the current locale. Because the locale is limited to supporting one writing script and one set of cultural attributes, the application is also limited to creating documents containing text in one script.

In a multilanguage environment, the user must launch a separate instance of an application in different locales for the application to take advantage of differing language and cultural attributes. For example, if someone using the English-based operating environment wishes to create a document containing Chinese characters, the user must first set up the Chinese locale and then launch the application to begin creating Chinese content. In order to enter Russian text, the Russian locale must be set up and another instance of the application has to be launched - this time within the Russian locale. In this environment, the Chinese and Russian text cannot be mixed, and the user must alternate between locales to create text of different scripts.

Multiscript Environment

In a multiscript computing environment, a locale may support more than one script, but the locale is still limited to one set of cultural attributes. In this context, an application can create a document with text in multiple scripts. However, the application must tag or otherwise mark each separate run of text of the same script to apply the appropriate language attributes for proper input and display. The user can now create one multiscript document containing both Chinese and Russian text rather creating two separate documents. However, the cultural attributes of the active locale still apply. Therefore, if the user is in the Chinese locale, the Chinese sorting rules will be applied to the mixed script text.

Multilingual Environment

In a multilingual computing environment, a locale can support multiple scripts and multiple cultural attributes. The same application can now have the ability to transparently make use of both the language and cultural attributes of the different locales within a single locale. A document that contains text in multiple scripts can now sort text according to its script, rather than the sort order of the current locale. To illustrate further, a user can apply the sorting rules of the Chinese locale to the Chinese portion of the multiscript text and then call upon the Russian sorting rules to apply to the Russian portion of the multiscript text. The multilingual environment is the closest one can find in a search for ideal multilingual computing. The movement from multilanguage to multiscript to multilingual implies an increasing level of complexity in the underlying operating environment. Therefore, for an application environment to be truly multilingual, it would have to bring together these different components. Solaris 7's internationalized framework provides the means for enterprises to develop a multilingual application environment.

SOLARIS MULTILINGUAL FRAMEWORK

The Solaris multilingual environment can be set up by installing all localized products at once or by adding locales as when they are needed by installing one locale in addition to the existing locale. This is possible because Solaris' single internationalized binary system is localized into various languages, such as French, Japanese, and Chinese and it can load the respective localized messages and cultural data as and when needed. Its single internationalized binary enables dynamic retrieval of locale-specific data and shared objects at runtime. With this, the same copy of an enterprise's application can run on any localized version of the Solaris operating environment without code to be changed or recompiled.

Single installation

The combination of English and European locales on a single CD in Solaris 7 further enables more locale versions of Solaris to be available during a single installation. This adds value to enterprises and developers since they do not have to install additional packages to get European locales. Alternatively, users can also install Sun's Global Application Developer Kit 1.0, which is a single CD install, for the same full language support. This kit includes comprehensive internationalization tools and documentation to help corporate software developers and independent software vendors develop internationalized applications for the Solaris operating environment.

Unicode locales support

The further integration of Unicode locales with enhanced multiscript capabilities in the Solaris operating environment allows the application to handle text from multiple scripts in the same document without elaborate marking of text runs. Users practically create text in multiple scripts in one single document without having to switch locales.

Each Unicode locale in the Solaris environment includes a base language in the UTF-8 codeset and the regional data related to the base language and its cultural conventions. This includes local formatting rules, text messages, Help messages and other related files. Each locale also supports several other scripts for input, display, code conversion and printing. In Solaris 7, it has full compliance with Unicode 2.1.

Unicode support is further expanded with the addition of six new UTF-8 locales - French, German, Italian, Spanish, Swedish, and Japanese. These locales are in addition to the non-UTF-8 versions that were released in the earlier version of Solaris and complement the Unicode locales already available for Korean and English. These Unicode locales support the Common Desktop Environment (CDE) graphical user interface desktop (including Motif and CDE libraries), and they have been enhanced with multiscript capabilities so that users can input and display text from different writing scripts such as Japanese, Thai, and Russian, and easily switch between the scripts without having to change or install a new locale.

en_US.UTF-8 locale support

Enhancements have also been made to the en_US.UTF-8 locale, which is an American English-based locale with multiscript processing support for characters of many different languages. In Solaris 7, there are a total of eight input modes in the en_US.UTF-8 locale: English, Western/Eastern/Northern Europe, Cryllic, Greek, Hebrew, Thai, Arabic, Unicode hexadecimal code input method, and Table lookup input method. The end user can input characters from any combination of these scripts (and from the entire Unicode coding space) and at the same time also switch between these input modes using the Compose key or a Control key sequence.

The Unicode hexadecimal code input method lets the user generate Unicode characters by typing the hexadecimal Unicode values of the characters, while the Table lookup input method provides a lookup window on the desktop for choosing a script and and then selecting characters of the script from the available lookup table. This lookup input method is the easiest method for non-native speakers to input characters of a foreign language.

Last but not least, the Solaris internationalization framework follows the concept of code set independent design - designing applications that do not make assumptions about the underlying code set. By treating Unicode as just another code set, Solaris 7 provides applications with a global set of code sets that will be able to handle different code sets without the need for extensive code rework to support specific languages. With the further enhancement of code set utilities, such as iconv, this has enabled better data interoperability. For instance, conversions between Solaris Asian character sets and IBM Asian character sets are now possible as a result of these iconv enhancements.

Multilingual Computing & Technology Volume 11 Issue 1
February 2000

Related Links