IntroductionStarOffice from Sun is a cross-platform office productivity suite, which includes the following applications:
All of these tools are built on a robust graphics framework and can be customized using the StarOffice Basic language, which enables developers to create a wide range of office applications. The ChallengeStarOffice developers at Sun faced the following challenges:
The SolutionThis section describes how the new StarOffice internationalization framework addresses these challenges:
How to Support Global UsersTo provide support for global users in multilingual language environments, StarOffice developers have built a universal internationalization framework. The StarOffice internationalization framework provides a rich set of APIs to internationalize StarOffice applications using the Universal Network Objects (UNO) component model. UNO is an interface-based object model like COM or CORBA that is used to integrate all StarOffice components. The UNO is designed to be as efficient as COM with additional features.You can now modify a locale behavior or add a new locale without modifying or recompiling the source code. The StarOffice internationalization framework is platform-independent and universally accessible to any CORBA or COM components irrespective of their programming language through the UNO remote bridges for CORBA and OLE. The framework is also reusable outside StarOffice, for example, in GNOME. How to Replace the Single-Byte Framework with a Multilingual Unicode-Based FrameworkThe decision was made to integrate Unicode character handling in the upcoming version of StarOffice to support all European, Asian, and BiDi languages, and to base the new internationalization framework on UNO. In a time frame of just four weeks, the StarOffice development team succeeded in changing about 7 million lines of C++ code to Unicode. This was made possible largely due to the fact that in StarOffice 5.2, character representation was handled using a C++ String class and was platform-independent. Another advantage was that more than 80% of the code was system independent.The Unicode conversion process involved the following stages:
The resource file system in StarOffice 5.2 was already enabled to read UTF-8 strings and display Unicode; however, developers implemented additional Unicode file and clipboard I/O for the upcoming version of StarOffice. Unicode Conversion Problems To change to Unicode, you need a good base Unicode String class. This class must be able to insert, search, compare, and replace any ASCII characters because you do not want to change all strings to Unicode. For example, in RTF or HTML files, you only want to convert specific content. You do not want to convert tokens, which are ASCII characters, to Unicode characters. New code converters were introduced to convert from Unicode to the legacy code set and vice versa. Another requirement was the ability to load and save data in files, such as configuration files, database files, and other file formats. For example, you must be able to save data in its own binary format and in other third party formats, such as text, RTF, and WinWord. An Extensible and Pluggable Internationalization FrameworkThe StarOffice architecture is based on a layered approach to allow easy porting to different platforms. There are four well-defined layers:
Figure 1: StarOffice Layered Architecture The upcoming version of StarOffice will enable developers to create customized applications using modules in the framework layer. Although these modules are developed using C++, they are all UNO components. UNO components can be used by modules in other languages and can run on different hosts. The new internationalization framework has several UNO components. This means that it is accessible to any component in any language. StarOffice internationalization requirements include:
To enable multi-lingual document processing on all platforms, StarOffice uses Unicode to represent characters. The internationalization framework must provide a character classification mechanism to support Unicode 3.0. The character classification API must be able to handle multiple code points per character. EncapsulationIn the future, the StarOffice development team plan to support up to 76 locales. All locale-sensitive behavior must be encapsulated in the internationalization framework APIs to support additional locales. For example, users might want to search a document for a particular string. The search might include an option to perform a case-insensitive search; however, case-insensitive searches are irrelevant in the case of Japanese documents. For Japanese, it makes more sense to perform a search without distinguishing between katakana and hirigana characters. These options are locale-sensitive and must be encapsulated within the internationalization framework. Pluggable Locale SupportSince StarOffice supports many locales, the locale support is prone to error. The new internationalization framework must make it easier to add or modify locale behavior. If a customer finds a bug in the behavior of a specific locale, the internationalization framework must enable you to remove the error prone module and replace it with a new one without affecting the StarOffice binary. By developing the internationalization framework using the UNO component model, locale behavior can be easily modified in the UNO repository. CollationUsers can choose more than one collation algorithm to sort data. This means that collation APIs must provide an interface to query the collation algorithms for the locale and enable users to select the collation algorithm that they want to use. Collation can be used by end-users to sort data, as well as internally by the application to sort file names and font names. The collation rule that an application uses to sort and display font or file names does not have to be very strict. For example, in the Japanese locale, the application can ignore the difference between half-width and full-width characters. The options are locale-specific and cannot be specified in the application. The collation API must provide abstract and yet easy-to-use options that map onto locale-sensitive options. Number FormatterIn StarOffice 5.2, the number formatter makes extensive use of locale data and number format codes provided by the internationalization framework. For the upcoming version of StarOffice, new keyword symbols, parsing methods, and string output methods have been developed to enable the number formatter to make use of the new calendar API, and to use different calendars in the same format code. One particular goal, was not only to create a calendar format that behaves the same way as that in Japanese Microsoft Excel, for example, but also to display any combination of calendar systems for a locale, as long as the locale data provides information about them. CalendarThe calendar API provides an interface for performing date arithmetic based on various calendars. Even though most of the locales support the Gregorian calendar by default, many locales support additional calendars. For example, the Japanese locale supports the Emperor Era calendar as well as the Gregorian calendar; hence, the calendar API should have an interface to query the available calendars for any locale. Break IteratorThe internationalization framework must provide APIs to iterate a string by character, word, line, and sentence. Iterating characters is essential for two reasons:
Line breaking must be highly configurable in desktop publishing applications. The line breaking algorithm must be able to find a line break with or without a hyphenator. The line breaking API must also be able to parse special characters that are illegal if they occur at the end or beginning of a line. The character, word, and line breaking algorithms are locale-sensitive and must be pluggable. The New StarOffice Internationalization Framework ArchitectureThe new StarOffice internationalization framework includes the following major components:
Each component of the framework is an UNO component. The following figure shows the interaction between various components:
Figure 2: StarOffice Internationalization Framework Architecture Since all components are locale-sensitive, each component is written under a unique service name.
StarOffice defines the naming convention for each component. For example, the service name convention for
a break iterator object is as follows: If you run StarOffice in the Thai locale, it loads the following service: com.sun.staroffice.i18n.imp.th_TH.breakiterator.Developers can register their Thai break iterator module against the service name and StarOffice will automatically load it at runtime. By following the naming convention, any locale-sensitive component can be plugged into the StarOffice binary repository dynamically. Hence, StarOffice locale behavior can be enhanced without recompiling. Even though every locale-sensitive component can be registered using a unique service name, it is not possible to register all components for all locales. For example, the break iterator for different Spanish locales, that is, other Spanish speaking regions, is the same as that in the locale for Spain (es_ES). The modules referred to as stubs in the Figure 1 provide fallback functionality; that is, a service that is guaranteed to be available. StarOffice modules use the stub modules to parse the locale information. A stub module attempts to locate a locale-sensitive module using the service naming convention. If such a service is unavailable, the stub module attempts one more time without using the country name. Even if it fails, it loads a default module. For example, to locate a break iterator for the French Canadian locale, it attempts to locate a break iterator service for fr_CA. If the break iterator for fr_CA is unavailable, the stub module attempts to locate a break iterator for fr. If it is not available, it falls back to the default break iterator. Adding a New Locale
ConclusionStarOffice is not just an office productivity suite; it is a completely object-oriented platform for developing any cross-platform desktop application. The new StarOffice internationalization framework is Unicode based and offers a rich set of APIs, which meet the requirements of existing applications and are also generic enough to be used for any application developed on this platform. The APIs encapsulate all localization behaviour inside the internationalization framework. This means that localization developers can add new locales or enhance existing locale behaviour to meet regional market requirements without modifying the StarOffice binary. The internationalization framework is accessible to CORBA/UNO components, which makes the StarOffice internationalization framework universal. Further InformationFor a demo of StarOffice multilingual features, why not visit our booth at the 19th International Unicode Conference in San Jose, California, September 10-14.References
OpenOffice Localization and Internationalization Project | |||||
Oracle is reviewing the Sun product roadmap and will provide guidance to customers in accordance with Oracle's standard product communication policies. Any resulting features and timing of release of such features as determined by Oracle's review of roadmaps, are at the sole discretion of Oracle. All product roadmap information, whether communicated by Sun Microsystems or by Oracle, does not represent a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. It is intended for information purposes only, and may not be incorporated into any contract.
|
| ||||||||||||