INTERNATIONAL SUPPORT IN SOLARIS 7 Multilanguage and multiscript features unite in Sun's new working environment The recent entry of electronic commerce has shifted the whole business paradigm of distributing goods and services, which traditionally has been through retail chains and direct mail catalogs among others. Even though electronic commerce provides an opportunity for enterprises to further enlarge their market share and presence across different regions of the world, it does not come without its own set of challenges. These enterprises now realize that they need to have distributed, server-centric applications that can handle multilingual data processing to cater to individual Web clients' needs in their native languages. As the global economy becomes more integrated, multinational companies with headquarter offices and branch operations in different parts of the world are seeing the pressing need to have a unified system-software architecture that can support global networks without the incompatibilities often found with different localized versions of software. Most of all, they need internationalized applications that can support end-to-end computing across multilingual and multicultural barriers without modification to their core systems. This trend towards globalization of markets and economies is driving complex requirements in the area of language support, particularly when there are so many different writing scripts being used in the world that need to be taken into consideration. A script can be defined as a collection of related graphic symbols used for writing. Different regions and cultures have developed thousands of languages and hundreds of scripts to communicate within their own cultural settings. Most European languages use the Latin script, while Middle Eastern languages often use the Arabic script, and some Asian languages use Han ideographs. In order for an application to be able to communicate seamlessly across multilingual and multicultural barriers, the operating environment has to be able to handle the different types of writing systems. This is where we categorize the different types of writing systems into three groups:
WRITING SYSTEMS Simple Input to Simple Output In English and other Latin-script languages, these langauages are represented
with single byte characters, whereby one single keystroke will produce a reciprocal
character on screen. The input characters are stored in the order in which they are
typed, and these characters are processed for display or printing in the same
sequence. There is no difference between the way the text is stored and
the way it is displayed. This is known as the Simple Input to Simple Output
type of writing system. All data processing systems are capable of handling this. However, many world languages have writing systems that are fundamentally
different from that of western languages. Complex transformations often
need to take place between the input of words and sentences and the actual
rendering of these words and sentences on the screen or the printed page.
Languages such as Japanese, Korean, and Chinese are based on a set of symbols
or ideographs and require multiple keystrokes before a glyph (symbol) can be
displayed on the screen. Because of the complexity involved, these languages
require pre-processing of text input before a simple output can be derived.
This second type of writing system is known as Complex Input to Simple Output. The third type of writing system includes Complex Text Layout (CTL) languages such as Arabic, Hebrew, and Thai. These languages have a different layout for display and printing of text from the order in which the text is stored. For these languages, the characters could be composed of several alphabetic elements, including vowels, consonants, diacritics, and tone marks as in the Thai language. Also, many of these CTL scripts are bi-directional and context dependent. Such scripts require post processing before text can be properly rendered on screen. This third type of writing system is known as Simple Input to Complex Output. The Solaris 7 operating environment can support all three types of writing systems. It currently supports up to 37 languages and 97 locales. CTL languages are the latest support added to Solaris 7. Fully localized in 10 languages, it is available as English and European Solaris (including German, French, Spanish, Swedish, and Italian), Simplified Chinese, Traditional Chinese, Japanese Solaris and Korean Solaris. Each of these products includes its own culturally specific date/time/number formats, monetary format, associated codeset, collation (sort order), input methods, and interface information (messages, icons). For developers who need to test different locales or to work in different
locales for different projects, Solaris provides support for an additional 27
languages, making a total of 37 languages available for their use. There are many ways to implement a multilingual computing environment. This is where we find the phrase multilingual computing taking on a different form in the real computing environment. It is important that we first understand and distinguish among the different types of so-called multilingual computing environments. There are really three types available:
In a multilanguage environment, an application inherits all the language and cultural attributes of the current locale with text manipulated according to the language rules of the current locale. Because the locale is limited to supporting one writing script and one set of cultural attributes, the application is also limited to creating documents containing text in one script. In a multilanguage environment, the user must launch a separate instance
of an application in different locales for the application to take advantage
of differing language and cultural attributes. For example, if someone using
the English-based operating environment wishes to create a document containing Chinese
characters, the user must first set up the Chinese locale and then launch
the application to begin creating Chinese content. In order to
enter Russian text, the Russian locale must be set up and another instance
of the application has to be launched - this time within the Russian locale.
In this environment, the Chinese and Russian text cannot be mixed, and the user
must alternate between locales to create text of different scripts. In a multiscript computing environment, a locale may support more than
one script, but the locale is still limited to one set of cultural attributes.
In this context, an application can create a document with text in multiple
scripts. However, the application must tag or otherwise mark each separate run of
text of the same script to apply the appropriate language
attributes for proper input and display. The user can now create
one multiscript document containing both Chinese and Russian text rather
creating two separate documents. However, the cultural attributes
of the active locale still apply. Therefore, if the user is in the Chinese locale,
the Chinese sorting rules will be applied to the mixed script text. In a multilingual computing environment, a locale can support multiple
scripts and multiple cultural attributes. The same application can now
have the ability to transparently make use of both the language and cultural
attributes of the different locales within a single locale. A document
that contains text in multiple scripts can now sort text according to its script,
rather than the sort order of the current locale. To illustrate further, a
user can apply the sorting rules of the Chinese locale to the
Chinese portion of the multiscript text and then call upon the Russian
sorting rules to apply to the Russian portion of the multiscript text.
The multilingual environment is the closest one can find in a search for ideal multilingual computing.
The movement from multilanguage to multiscript to multilingual implies
an increasing level of complexity in the underlying operating environment.
Therefore, for an application environment to be truly multilingual, it
would have to bring together these different components. Solaris 7's internationalized
framework provides the means for enterprises to develop a multilingual application
environment. The Solaris multilingual environment can be set up by installing
all localized products at once or by adding locales as when they are needed
by installing one locale in addition to the existing locale. This is possible because Solaris'
single internationalized binary system is localized into various languages,
such as French, Japanese, and Chinese and it can load the respective
localized messages and cultural data as and when needed. Its single internationalized
binary enables dynamic retrieval of locale-specific data and shared objects
at runtime. With this, the same copy of an enterprise's application can run on
any localized version of the Solaris operating environment without code to be changed
or recompiled. The combination of English and European locales on a single CD in Solaris 7
further enables more locale versions of Solaris to be available during a single installation.
This adds value to enterprises and developers since they do not have to install
additional packages to get European locales. Alternatively,
users can also install Sun's Global Application Developer Kit 1.0, which
is a single CD install, for the same full language support. This kit
includes comprehensive internationalization tools and documentation to help
corporate software developers and independent software vendors develop internationalized applications
for the Solaris operating environment. The further integration of Unicode locales with enhanced multiscript capabilities in the Solaris operating environment allows the application to handle text from multiple scripts in the same document without elaborate marking of text runs. Users practically create text in multiple scripts in one single document without having to switch locales. Each Unicode locale in the Solaris environment includes a base language in the UTF-8 codeset and the regional data related to the base language and its cultural conventions. This includes local formatting rules, text messages, Help messages and other related files. Each locale also supports several other scripts for input, display, code conversion and printing. In Solaris 7, it has full compliance with Unicode 2.1. Unicode support is further expanded with the addition of six new UTF-8 locales - French,
German, Italian, Spanish, Swedish, and Japanese. These locales are in addition to the
non-UTF-8 versions that were released in the earlier version of Solaris and complement
the Unicode locales already available for Korean and English. These Unicode locales support
the Common Desktop Environment (CDE) graphical user interface desktop
(including Motif and CDE libraries), and they have been enhanced with multiscript capabilities
so that users can input and display text from different writing scripts such as Japanese,
Thai, and Russian, and easily switch between the scripts without having to change or install a new locale. Enhancements have also been made to the The Unicode hexadecimal code input method lets the user generate Unicode characters by typing the hexadecimal Unicode values of the characters, while the Table lookup input method provides a lookup window on the desktop for choosing a script and and then selecting characters of the script from the available lookup table. This lookup input method is the easiest method for non-native speakers to input characters of a foreign language. Last but not least, the Solaris internationalization framework follows the concept of code set
independent design - designing applications that do not make assumptions about the underlying code set.
By treating Unicode as just another code set, Solaris 7 provides applications with a global set
of code sets that will be able to handle different code sets without the need for extensive code rework to
support specific languages. With the further enhancement of code set utilities, such as iconv,
this has enabled better data interoperability. For instance, conversions between Solaris Asian character
sets and IBM Asian character sets are now possible as a result of these iconv enhancements. February 2000 |
| |||||||||||||||||||||||||||||||
|
| ||||||||||||