Sun Java Solaris Communities My SDN Account Join SDN

Article

I18N Guidelines for C and C++

 
 

These guidelines cover I18N practices for products and applications written in C or C++. For the Java language, many of the same principles apply. All of these guidelines may not be appropriate for any one product, application, or version of the Solaris operating environment. For more information, see the references on individual pages.

Code Set Independence:
Wide Characters and Avoiding EUC Dependencies

  1. Use wide character functions and declarations to process certain characters or strings. This processing may include file paths, command line options, file contents, or other program information, input or output. There are wide character functions to substitute for various multibyte string, time, and I/O functions as well as functions that convert between wide character and multibyte. See the man page references for more information. There are also wide character versions of the isXX() functions.
  2. You can use the mblen() function at times for the situations described above, but use caution. You cannot use it to scan backwards.
  3. Avoid EUC dependencies, including the use of EUC-related include files, functions, and declarations. Also avoid the csetlen, csetcol, and csetno functions.
  4. Do not hardcode any non-ASCII characters.
  5. Processing that may require the use of wide character functions and declarations includes collation, sorting, parsing, string editing, searching random offsets, and truncation.
  6. Processing that may not require the use of wide character functions includes copying or moving data, comparing for equality, searching for control characters in most cases and codesets, and processing single-byte data.
  7. Wide character operations are for internal processing only. The data comes in as multibyte and should leave as multibyte.
  8. Be sure that 8-bit clean and other related coding practices are followed.
  9. Use wcwidth and wcswidth for handling display widths.
  10. The printf family of calls has wide character options S and C.

Back to Top

8-bit Clean, Sorting-Collation, and Other I18N Coding Issues

  1. Do not use the eighth or top bit of a character for processing. Check include files and code for bitwide ands and ors with bitmasks equal to 0x7f or 0x80 to help find these problems.
  2. Do not hardcode character range comparisons; use the isXX functions.
  3. Do not assume characters fall into the range 0-127.
  4. Code that casts char to other lengths may not be 8-bit clean. This relates to sign extension. Declaring or casting to unsigned char may help in this case, depending on the processing that needs to be done. Copying or comparing for equality should not require this, but operations such as indexing into an array can require that the data type be unsigned char. Occasionally, you must declare the data as an int in order to compare data against integer quantities.
  5. Avoid use of external global variables.
  6. Do not hard code decimal separators.
  7. Use special functions for centering and aligning.
  8. Do not hardcode specific time, date, monetary, number, name, address, measurement, and paper size information.
  9. Use specific I18N versions (wide char or multibyte) of time, date, strcmp, and monetary information gathering and processing functions.
  10. Use localconv and strfmon for locale, monetary, number, and date information.
  11. Use proper locale announcements. X programs have different locale announcements than non-X programs.
  12. Avoid fixed-size buffers when dealing with unknown string length. Strings in other locales can be up to 200% greater than strings in the C locale.
  13. Use perror or strerror instead of sys_errlist.
  14. Follow the guidelines in the CSI-Wide character section.
  15. Do not hardcode words that have implicit local cultural assumptions.
  16. Regular expression matching has some enhancements for use with I18N.
  17. Do not hardcode the word order in statements that have more than one parameter. The person who localizes the product will use the %X$ notation in printf to handle the word order.

Back to Top

Messages and Message Catalogs

  1. A product will usually use either the gettext or catgets messaging systems or X Resource files for handling messages and GUI labels that the user sees.
  2. Use catopen and catgets properly as related to message catalog names and locations, set numbers, and message numbers.
  3. Use gettext and msgfmt properly as related to message catalog names and locations.
  4. NLSPATH should be properly set to get message catalogs from the proper place.
  5. Product message files are usually found in <application>/lib/locale/<locale>/LC_MESSAGES. The exact location is product specific; the important thing is that message files should reside at locale-specific locations.
  6. Perform message catalog and message catalog comments reviews:
    • Message content should be free of cultural assumptions.
    • Messages should be clear to users in another locale.
    • Use message files to assist developers by explaining the meaning of messages, when necessary, and to indicate words that should not be translated.
    • Message set and numbering should not be duplicated for catgets message files.
  7. Do not use fragmented messages, and do not create messages dynamically from fragments. Instead, use one complete message for each message that a user might see.
  8. Do not assume messages will be a certain length. Messages often increase in length when translated.
  9. Do not assume the word order in messages that have more than one value in a printf statement.
  10. In most cases, the following should not be translated:
    • File modes
    • File and directory names
    • Environmental variables
    • Widget names
    • System calls and library names
    • System commands
    • Programing language legal words
    • Header file names
    • The getopt arguments
    • Pragma ident values
    • Messages not intended for the user
    • Exec arguments
    • Strings with special characters
    • Resource names
    • Option flags
  11. Initialize strings for catgets or gettext in the proper way.
  12. Use one consistent messaging scheme whenever possible.
  13. If the product is not installed or not installed correctly in a given locale, then a user running the product in that locale should see messages and other information from the default locale, which, on systems running the Solaris operating environment, is usually the C locale.

Back to Top

X/Motif I18N Programming

  1. Use the proper font management methods and routines for setting up and using non-English codesets and fonts, including font sets and font lists. Some languages allow resource or other locale-specific files to name the actual font lists and sets.
  2. There are specific X and Xt routines for proper locale announcement and setting of NLSPATH and other variables. These routines should occur before other initialization routines.
  3. Use the proper calls and arguments for creating compound strings. (XmCreateLocalized)
  4. If foreign characters are to be rendered in a Motif DrawingArea widget, it is possible that lower level X library calls should be used. Motif takes care of this for its other widgets.
  5. Choose widgets that will expand dynamically when a longer message is shown. Most messages increase in length when translated.
  6. Use explicit wide-character functions to center and align text.
  7. Icons, graphics, and colors should make no cultural assumptions. Use resource or other files installed in locale-specific locations to handle this type of localized data.
  8. Input methods should be handled in applications.
  9. Accelerator keys should not have hardcoded values.

Back to Top

Rate and Review
Tell us what you think of the content of this page.
Excellent   Good   Fair   Poor  
Comments:
Your email address (no reply is possible without an address):
Sun Privacy Policy

Note: We are not able to respond to all submitted comments.