These guidelines cover I18N practices for products
and applications written in C or C++. For the
Java language,
many of the same
principles apply. All of these guidelines may not be
appropriate for any one product, application, or version of
the Solaris operating environment.
For more information, see the references on individual pages.
Code Set Independence:
Wide Characters and Avoiding EUC Dependencies
-
Use wide character functions and declarations to process certain characters or
strings. This processing may include file paths, command line options,
file contents, or other program information, input or output. There are wide
character functions to substitute for various multibyte string, time, and
I/O functions as well as functions that convert between wide character
and multibyte. See the man page references for more information. There
are also wide character versions of the isXX() functions.
-
You can use the mblen() function at times for the situations described above,
but use caution. You cannot use it to scan backwards.
-
Avoid EUC dependencies, including the use of EUC-related
include files, functions, and declarations. Also avoid the csetlen, csetcol,
and csetno functions.
-
Do not hardcode any non-ASCII characters.
-
Processing that may require the use of wide character functions and
declarations
includes collation, sorting, parsing, string editing, searching random
offsets, and truncation.
-
Processing that may not require the use of wide character functions includes
copying or moving data, comparing for equality, searching for control
characters
in most cases and codesets, and processing single-byte data.
-
Wide character operations are for internal processing only. The data comes
in as multibyte and should leave as multibyte.
-
Be sure that 8-bit clean and other related coding practices are followed.
-
Use wcwidth and wcswidth for handling display widths.
-
The printf family of calls has wide character options S and C.
8-bit Clean, Sorting-Collation, and Other I18N Coding Issues
-
Do not use the eighth or top bit of a character for processing. Check include
files and code for bitwide ands and ors with bitmasks equal to 0x7f or
0x80 to help find these problems.
-
Do not hardcode character range comparisons; use the isXX functions.
-
Do not assume characters fall into the range 0-127.
-
Code that casts char to other lengths may not be 8-bit clean. This relates
to sign extension. Declaring or casting to unsigned char may help in this
case, depending on the processing that needs to be done. Copying or comparing
for equality should not require this, but operations such as indexing into
an array can require that the data type be unsigned char. Occasionally,
you must declare the data as an int in order to compare data against
integer quantities.
-
Avoid use of external global variables.
-
Do not hard code decimal separators.
-
Use special functions for centering and aligning.
-
Do not hardcode specific time, date, monetary, number, name, address,
measurement,
and paper size information.
-
Use specific I18N versions (wide char or multibyte) of time, date, strcmp,
and monetary information gathering and processing functions.
-
Use localconv and strfmon for locale, monetary, number, and date information.
-
Use proper locale announcements. X programs have different locale announcements than non-X programs.
-
Avoid fixed-size buffers when dealing with unknown string length. Strings
in other locales can be up to 200% greater than strings in the C locale.
-
Use perror or strerror instead of sys_errlist.
-
Follow the guidelines in the CSI-Wide character section.
-
Do not hardcode words that have implicit local cultural assumptions.
-
Regular expression matching has some enhancements for use with I18N.
-
Do not hardcode the word order in statements that have more than one parameter.
The person who localizes the product will use the %X$ notation in printf to handle the
word order.
Messages and Message Catalogs
-
A product will usually use either the gettext or catgets messaging systems
or X Resource files for handling messages and GUI labels that the user
sees.
-
Use catopen and catgets properly as related to message catalog names and
locations, set numbers, and message numbers.
-
Use gettext and msgfmt properly as related to message catalog names and
locations.
-
NLSPATH should be properly set to get message catalogs from the proper
place.
-
Product message files are usually found in
<application>/lib/locale/<locale>/LC_MESSAGES.
The exact location is
product specific; the important thing is that message files should
reside at locale-specific locations.
-
Perform message catalog and message catalog comments reviews:
- Message content should be free of cultural assumptions.
- Messages should be clear to users in another locale.
- Use message files to assist developers
by explaining the meaning of messages, when necessary, and to
indicate words that should not be translated.
- Message set and numbering should not be duplicated for catgets
message files.
-
Do not use fragmented messages, and do not create messages dynamically
from fragments. Instead, use one complete message for each
message that a user might see.
-
Do not assume messages will be a certain length. Messages often
increase in length when translated.
-
Do not assume the word order in messages that have more than one value in a
printf statement.
-
In most cases, the following should not be translated:
- File modes
- File and directory names
- Environmental variables
- Widget names
- System calls and library names
- System commands
- Programing language legal words
- Header file names
- The getopt arguments
- Pragma ident values
- Messages not intended for the user
- Exec arguments
- Strings with special characters
- Resource names
- Option flags
-
Initialize strings for catgets or gettext in the proper way.
-
Use one consistent messaging scheme whenever possible.
-
If the product is not installed or not installed correctly in a
given locale, then a user running the product in that locale
should see messages and other information from the default locale,
which, on systems running the Solaris operating environment,
is usually the C locale.
X/Motif I18N Programming
-
Use the proper font management methods and routines for setting up and
using non-English codesets and fonts, including font sets
and font lists. Some languages allow resource or other locale-specific
files to name the actual font lists and
sets.
-
There are specific X and Xt routines for proper locale announcement
and setting of NLSPATH and other variables. These routines
should occur before other initialization routines.
-
Use the proper calls and arguments for creating compound strings.
(XmCreateLocalized)
-
If foreign characters are to be rendered in a Motif DrawingArea
widget, it is possible that lower level X library calls should be used.
Motif takes care of this for its other widgets.
-
Choose widgets that will expand dynamically when a longer message
is shown. Most messages increase in length when translated.
-
Use explicit wide-character functions to center and align text.
-
Icons, graphics, and colors should make no cultural assumptions.
Use resource
or other files installed in locale-specific locations to handle
this type of localized data.
-
Input methods should be handled in applications.
-
Accelerator keys should not have hardcoded values.
|
|