4.3.2.1 Characters (Semantics and Codespaces)
Description
The following table describes terms that are used in this section. Several of the definitions are taken from RFC 2130 - The Report of the IAB Character Set Workshop, 29 February - 1
March, 1996.
Table 4-9. Terms and Definitions
| Term | Definition |
| character | General representation of a single written symbol used in a writing system. This can
include symbols, punctuation, and in computer terms, control codes. |
| character set | Complete group of characters for one or more writing systems. More complete than
an alphabet. |
| glyph | Graphical representation of a character. For example, the character "LATIN SMALL LETTER A"
can appear as the glyphs "a", "a", "a", and "a." |
| coded character set | Mapping from a set of abstract characters to a set of integers. |
| codeset | See coded character set. |
| character-set-name | Official or unofficial name used to refer to a codeset. |
| charset | Name used to refer to a defined computer character set standard. |
| character encoding scheme | Mapping from a coded character set (or several) to a set of octets.
|
| transfer encoding syntax | Transformation applied to data that has been encoded using a
character encoding scheme to allow it to be transmitted. |
| single-byte | Data with a value of length 1 byte, or 8 bits. |
| multi-byte | Data with a value of varying length from 1 byte, or 8 bits, to 6 bytes, or 48
bits. |
A character has no fixed semantics; that is, characters change their behavior depending on the context. Consider the
following aspects of character semantics as they relate to program code:
Production
Most English letters are produced using a single keystroke, but to produce the ligature æ, several
keystrokes are required. For Asian languages, even more keystrokes might be necessary to produce the desired glyph. Several
characters might be necessary to form a single glyph, as in Korean Hangul.
Size
A glyph for a particular character can vary in size and shape from typeface to typeface and language to language. For
example, here is the character w in several typefaces: w, w, w, w.
To illustrate a language context, Polish accent marks are closer to their base letters than French accent marks.
Classification
Some languages may consider glyphs as uppercase and lowercase of a single character, some others categorize them by
their position in a word. Languages written in Latin, Cyrillic, and Greek scripts have case distinction, those written in Arabic
have standalone, initial, medial, and final forms.
Equivalence
With the differences in classification come differing rules for equivalence. Even among different users of the same
language there are different concepts of character equivalency.
Command Line Interface
Command line reads in characters in the form of commands and their parameters and returns characters in the form of
data. While commands themselves are to remain constant regardless of which localized product they are in, the parameters might be
data in any codeset.
Character Interface
A character interface, like command line, takes character input and produces character output.
Graphical Interface
For graphical interfaces, characters can be layered onto graphical objects, adding a layer of complexity to character
handling.
Application Protocols
Protocols can include character data as part of the protocol stream or identify character data.
Storage and Interchange
Storage and interchange formats usually accommodate character data.
Application Programming Interfaces (APIs)
APIs can take character data as parameters to calls and return character data.
Requirements for Compliance
In general, providers must supply functions that can accommodate any character encoding scheme. Consumers must use
provider functions and manage the codesets so as to accommodate data in any of the provider codesets. This means that the consumer
must supply the provider functions with required information for correctly processing the data.
Command Line Interface
Providers must supply character functions for reading in and returning character data to the command line in any
character encoding scheme they support.
Consumers must use provider supplied character functions, making sure to accommodate multi-byte characters for input
and output, as well as single byte.
Character Interface
Providers must supply character functions for reading in and returning character data to the character interface in any
character encoding scheme they support.
Consumers must use provider supplied character functions, making sure to accommodate multi-byte characters for input
and output, as well as single byte.
Graphical Interface
Providers must supply character functions for managing character data with various elements of the graphical user
interface (GUI), such as buttons, drop-down lists, and title bars. These functions must accommodate all supported character
encoding schemes.
Consumers must use provider functions for creating the GUI.
Application Protocols
Providers must construct the protocol so as to accommodate any character data in some specified format.
Consumers must implement the protocol with all related character information, including charset, language, and
locale.
Storage and Interchange
Providers must allow for storage of any character data, supplying formats that contain relevant information for proper
retrieval.
Consumers must include all relevant information in the storage and interchange formats so that character data in any
character encoding scheme can be properly retrieved.
Application Programming Interfaces
Providers must supply interfaces that accommodate any character data, where relevant.
Consumers must include relevant character data descriptions to the API functions to properly process character
data.
|
|