Sun Java Solaris Communities My SDN Account Join SDN
 
Article

Solaris 8 Internationalized Operating System

 


A Single, Global Software that scales to your requirement.



Business is increasingly more complex and competitive today. To thrive and grow, a company must expand its market presence, globalize its products and improve its information technology infrastructure. It must ensure its products are language-neutral and can be easily tailored for each target market. IT organizations must address the technology issues and complexities associated with supporting multinational operations. Information must be retrieved in multiple languages and the integrity of data must be maintained as it moves through a distributed, heterogeneous computing environment. Incompatible data encoding formats, which vary by locale and system, must be collectively managed. Meeting the needs of such multilingual environment requires a unified system architecture that can support local needs without the incompatibilities often found when using different localized versions of software. Developers need a framework upon which they can develop applications that can be deployed without modifications across the global network while supporting the needs of local users.

Solaris single binary, internationalized framework offers developers a set of APIs to help meet these needs and provides an environment for multilingual computing.


Solaris Single Binary Model


Sun released Solaris 8, a new version of the Solaris operating environment early this year. All previous releases of the Solaris Operating System right up to the most recent release of version 8, have been based upon a single binary architecture. The Solaris operating environment is an example of a product that supports both internationalization and localization. It is a single binary that has been localized into multiple language versions. Each of these localized versions contains cultural and linguistic support for the specific language. In Solaris, a locale is composed of both a base language, the country (territory) of use and the applicable codesets. An English-speaking user in the United States can select the en_US locale (English for the United States), while an English-speaking user in Great Britain can select en_GB locale (English for Great Britain but with different time/currency formatting).


With the release of Solaris 8 multilingual product and Sun's continued commitment to a single binary model, this has provided developers with enhanced infrastructure and interfaces needed to create internationalized software. The new Solaris 8 multilingual product has included full Unicode 3.0 support, as also defined in ISO/IEC 10646-1:2000. It is a major international release with a single packaging approach to universal language coverage, expanded locales, intuitive European locale repackaging as well as many other new enhancements. In this article, we will focus on the creation of this new single global product structure in the Solaris 8 Operating System, as well as talk about how the new operating environment has been enhanced to provide better data interoperability and enhanced user extensibility and customizability.



Single Global Product Structure

The Solaris 8 operating environment now includes support for more than 123 locales, covering 37 languages, all on a single, global product structure. Some of the new locales added are Iceland (ISO8859-1) and Russian (ANSI1251) locales. The latter is in addition to the existing Russian (ISO8859-5) locale found in previous versions of Solaris, and it provides native Microsoft data encoding support.


This single global packaging approach greatly simplifies the development and testing of applications for international markets. In the past, users had to order multiple media kits to support testing of their applications in multiple locale environments. With this new product structure, all language environments are now available on the Base CD while fully translated messages and online help is available on a separate Language CD. What this means is that the standard Solaris 8 base product will provide an English interface to input, display and print text in a target language, including multibyte locales while, the Solaris 8 Multilingual product will come with an additional Language CD that provides localized interface and translated documentation. This new single structure eliminates the need to purchase optional media kits each time developers want to set up a different non-English development or production environment.


User-intuitive Installation and Setup

Users will also find the setup and installation to be significantly easier, whether installing only a single language or the full range of 37 languages packaged with the Solaris 8 operating environment. The new installation interface in the Solaris 8 operating environment enables users to install only those regions for which they require locale support. Unlike previous releases which have the locale support tied to the software cluster installed, this new interface greatly helps users and developers easily add and remove locales as and when needed. In short, changes to the packaging on the Solaris 8 CD have reduced the storage requirements for a mixed language installation and a redesign of the install interface makes language selection and grouping extremely intuitive.


Enhanced Internationalization Framework

Solaris 8 Operating System has expanded the set of Unicode (UTF-8) locales from Solaris 7. It supports numerous additional European Unicode locales as well as additional Unicode locales for Simplified Chinese and Traditional Chinese. With all these Unicode locales now supporting the latest Unicode standard version 3.0, Solaris 8's multiscript environment has been further enhanced. With the full support for Complex Text Layout (CTL) scripts as well, developers and users can display text from multiple languages in a single environment, including proper rendering of text for bidirectional and context-sensitive shaping scripts like Arabic, Hebrew and Thai in the Unicode locale.


Enhanced user extensibility and customizability


In many languages of the world, especially those based on Latin, Cyrillic and Greek scripts, there are no difference between how text is stored for data processing and how it is presented on a display device or a printer in usual cases. The text is read horizontally from left to right and the characters are stored in a manner identical to how they are processed.


However, not all languages of the world share these characteristics. Some Middle East and South East Asian languages like Arabic, Hebrew, Thai, Lao, Hindi have complex characteristics like bidirectionality and context-sensitive shaping that require special processing of character/text data before actual rendering of the characters/text on display devices.


Due to their complex characteristics, it has been quite difficult to perform the special pre-processing of these languages with a single unified algorithm in a computer program. In the past, such special processing has been done using a software component called layout engine (LE) which is specifically designed to support a single language/script per LE. In Solaris 8 Operating System, we have created a Universal Multiscript Layout Engine (UMLE) which is a layout engine that is codeset independent, user customizable, and multiscript capable such that it can handle multiple scripts in providing the layout services. It is codeset independent because it can handle not only Unicode but also any other kind of codesets, thus making it universal. It is user-customizable as it allows users to create new behavior or customize existing behavior of the layout engine without touching the source of the layout engine. Compared to the locale-specific layout engine approach, the UMLE approach is much more flexible, multiscript capable, easier to maintain and has the ability to add new complex text layout scripts.


Another area of enhancement was in the creation of an user-extensible code conversion utility, geniconvtbl. In the past, developers were not able to include their custom-defined code conversions without time and resources from Sun's engineering team. By using geniconvtbl utility, customers can now define and use their own code conversions in Solaris. Developers can easily create user-defined codeset converters, enabling table driven creation and easy addition of new codeset conversions using geniconvtbl. This new capability enhances the ability of an application to deal with incompatible data types, particularly data generated from proprietary or legacy applications. It also supports modification to existing Solaris codeset conversions. The following diagram illustrates how geniconvtbl works in Solaris operating environment.


geniconvtbl


The utility, geniconvtbl permits user-defined and user-customizable codeset conversions with a standard system utility and interface like iconv (1) and iconv (3). The geniconvtbl utility accepts customer's code conversion definition in a flat text file and generates a code conversion binary table file that Solaris can understand and perform the code conversions.


Enhanced Data Interoperability

In addition to enhanced user extensibility, data interoperability with non-Solaris environment has also been improved in the Solaris 8 operating environment with the addition of many new data conversion utilities. There are code converters for Japanese mainframe data types, code converters for Microsoft data encoding (including user defined characters), Asian UTF-8 interoperability (e.g. China and Korea), as well as converters for various Unicode encoding formats and international and de facto industry standard codesets, as shown in the diagram below.

Unicode


In Summary...


Solaris's long standing, single binary, internationalized system together with the new single global product structure, enables enterprises to install multiple locales and reside concurrently on a remote server, with the ability to add and remove any locales as and when needed to. At the same time, users at the client end can have the flexibility and ease in customizing their own native language environment and still be able to call upon mixed applications with different locales to the same single desktop.


For application developers and independent software vendors, Solaris 8 Operating System provides greater extensibility and better data interoperability in a heterogeneous network computing environment. With an enhanced set of codeset converters, users can now define and use their own code conversions in Solaris 8, as well as have the ability to interoperate with some of the legacy and PC based multilingual data. Together with an integrated set of enhanced Unicode locales, support for latest standard of Unicode 3.0 and enhanced interface protocols, the development of true multilingual applications and web-based applications in the dot-com space has gotten even easier on Solaris 8.