Sun Java Solaris Communities My SDN Account Join SDN
 
Architecture, Design and Testing

Sun Software Product Internationalization Taxonomy

 
  « Previous | Contents | Next »
 

4.3.2.1 Characters (Semantics and Codespaces)


Description

The following table describes terms that are used in this section. Several of the definitions are taken from RFC 2130 - The Report of the IAB Character Set Workshop, 29 February - 1 March, 1996.
Table 4-9. Terms and Definitions
Term Definition
characterGeneral representation of a single written symbol used in a writing system. This can include symbols, punctuation, and in computer terms, control codes.
character setComplete group of characters for one or more writing systems. More complete than an alphabet.
glyphGraphical representation of a character. For example, the character "LATIN SMALL LETTER A" can appear as the glyphs "a", "a", "a", and "a."
coded character setMapping from a set of abstract characters to a set of integers.
codesetSee coded character set.
character-set-nameOfficial or unofficial name used to refer to a codeset.
charset Name used to refer to a defined computer character set standard.
character encoding scheme Mapping from a coded character set (or several) to a set of octets.
transfer encoding syntax Transformation applied to data that has been encoded using a character encoding scheme to allow it to be transmitted.
single-byteData with a value of length 1 byte, or 8 bits.
multi-byteData with a value of varying length from 1 byte, or 8 bits, to 6 bytes, or 48 bits.
A character has no fixed semantics; that is, characters change their behavior depending on the context. Consider the following aspects of character semantics as they relate to program code:
Production
Most English letters are produced using a single keystroke, but to produce the ligature æ, several keystrokes are required. For Asian languages, even more keystrokes might be necessary to produce the desired glyph. Several characters might be necessary to form a single glyph, as in Korean Hangul.
Size
A glyph for a particular character can vary in size and shape from typeface to typeface and language to language. For example, here is the character w in several typefaces: w, w, w, w. To illustrate a language context, Polish accent marks are closer to their base letters than French accent marks.
Classification
Some languages may consider glyphs as uppercase and lowercase of a single character, some others categorize them by their position in a word. Languages written in Latin, Cyrillic, and Greek scripts have case distinction, those written in Arabic have standalone, initial, medial, and final forms.
Equivalence
With the differences in classification come differing rules for equivalence. Even among different users of the same language there are different concepts of character equivalency.

Command Line Interface

Command line reads in characters in the form of commands and their parameters and returns characters in the form of data. While commands themselves are to remain constant regardless of which localized product they are in, the parameters might be data in any codeset.

Character Interface

A character interface, like command line, takes character input and produces character output.

Graphical Interface

For graphical interfaces, characters can be layered onto graphical objects, adding a layer of complexity to character handling.

Application Protocols

Protocols can include character data as part of the protocol stream or identify character data.

Storage and Interchange

Storage and interchange formats usually accommodate character data.

Application Programming Interfaces (APIs)

APIs can take character data as parameters to calls and return character data.

Requirements for Compliance

In general, providers must supply functions that can accommodate any character encoding scheme. Consumers must use provider functions and manage the codesets so as to accommodate data in any of the provider codesets. This means that the consumer must supply the provider functions with required information for correctly processing the data.

Command Line Interface

Providers must supply character functions for reading in and returning character data to the command line in any character encoding scheme they support.
Consumers must use provider supplied character functions, making sure to accommodate multi-byte characters for input and output, as well as single byte.

Character Interface

Providers must supply character functions for reading in and returning character data to the character interface in any character encoding scheme they support.
Consumers must use provider supplied character functions, making sure to accommodate multi-byte characters for input and output, as well as single byte.

Graphical Interface

Providers must supply character functions for managing character data with various elements of the graphical user interface (GUI), such as buttons, drop-down lists, and title bars. These functions must accommodate all supported character encoding schemes.
Consumers must use provider functions for creating the GUI.

Application Protocols

Providers must construct the protocol so as to accommodate any character data in some specified format.
Consumers must implement the protocol with all related character information, including charset, language, and locale.

Storage and Interchange

Providers must allow for storage of any character data, supplying formats that contain relevant information for proper retrieval.
Consumers must include all relevant information in the storage and interchange formats so that character data in any character encoding scheme can be properly retrieved.

Application Programming Interfaces

Providers must supply interfaces that accommodate any character data, where relevant.
Consumers must include relevant character data descriptions to the API functions to properly process character data.
  « Previous | Contents | Next »
 
Related Links