Sun Java Solaris Communities My SDN Account Join SDN
 
Architecture, Design and Testing

Sun Software Product Internationalization Taxonomy

 
  « Previous | Contents | Next »
 

4.2.3.1 Ordered Lists (Collation)


Description

Collation refers to the comparing and ordering of data into sorted lists or categories. This is often a locale-specific function. The following table shows some locale-specific sorting sequences.1
Table 4-5. Locale Specific Ordering
Spanish German C
a a a
à à après
âpre âpre azur
après après lase
âpreté âpreté lassen
azur azur laß
être être llama
lase lase luna
lassen laß à
laß lassen âpre
luna llama âpreté
llama luna être
1. O'Donnell, Sandra Martin, Programming for the World: A Guide to Internationalization, Prentice Hall, April 1994, p. 224.
Note that the Spanish "ll" sorts between l and m, so llama is sorted after luna. German does not have a special rule for ll so it sorts llama before luna. Likewise, laß is sorted before lassen in German, but it is the opposite for Spanish. German treats ß as the two letters ss. Since Spanish does not include an ß in its locale definition, it simply sorts it by its encoded value. The C locale sorts everything by its encoded value; hence the considerable difference from the other locales.
Sorting methods used throughout the world include:
  • Multilevel - Involves secondary and tertiary sorting for tie breaks as well as primary sorting.
  • One-to-many - One character sorts as if it were a multicharacter string. For example, the German ß sorts as if it were ss.
  • Many-to-one - A multicharacter string sorts as a single character. For example, the Spanish ch and ll strings are treated as single characters for sorting purposes. The ch is sorted between c and d and the ll between l and m.

Command Line Interface

The output generated by many command line interfaces is often influenced by locale-specific sorting rules. For example, the UNIX command ls lists files in a directory. The order in which these files are displayed on screen can differ depending on the ordering rules for the locale.

Character Interface

Sorting issues should not have a major impact on character based interfaces. It is important to remember, though, that character interface components cannot dynamically change as easily as graphical interface components. This has implications when dealing with ordered lists. If you have a single text component per list item, it is difficult to dynamically rearrange these according to different ordering rules without having a major impact on performance

Graphical Interface

Sorting is more of an issue for graphical interfaces. Components in graphical interfaces tend to be dynamic. This is because their layout is often determined by user actions. For example, window systems like Microsoft Windows or X Windows enable users to arrange file icons according to one of several different parameters, including size and name. The latter of these implies ordering. This means that the software rearranging the icons by name needs to understand the sorting rules for the locale in which it is running and position the icons accordingly. This is a simple example of ordering in graphical interfaces. A more complex example would involve using a graphical interface to enable users to manipulate text lists. A spreadsheet is a typical example of this. Each list item can be represented by a single graphical component, for example, button, text field, and label. If the user adds or removes items in the list, the interface must be dynamically rearranged to allow for any change in the ordered list due to the addition or deletion.

Application Protocols

Application protocols sometimes include collation data that can be used as part of an information request or provision. In a request, the data might be used to tell another machine to perform the operation, as in a client request to a server. In providing the information, collation data might simply describe the order.

Storage and Interchange

While information is not usually physically stored in a particular order, it is often indexed this way. Some storage systems need to know the language of the data in order to set up the index properly.

Application Programming Interfaces (APIs)

Collation APIs can sort based on many different criteria. Some use the numeric value of each encoded character, others use tables and algorithms. For results based on language, the latter procedure is necessary. For example, here are some of the C sort functions:
Table 4-6. Sorting Functions
FunctionDescription
strcmp() and strncmp() Sorts ASCII data - based on encoded value
wcscmp() Performs wchar_t sort - based on numeric values
strcoll() and strxfrm() Collates data - char based
wcscoll() and wcsxfrm() Performs wchar_t sort
These functions use locale specific sorting rules. The *coll() functions are slower than than the *cmp() functions because the former use collation tables to determine order. To enhance performance, first use strxfrm() or wcsxfrm() which assign numeric values to characters using the current locale's sorting rules. strcmp()and wcscmp() can then be used to do numeric comparisons. This is particularly useful for comparing the same data several times. Instead of including multiple calls to the slower table-driven *coll() functions, transform the characters once and then use strcmp()or wcscmp() multiple times.
Example: Danish text sorts æ, ø, and å after z.
Suppose you need to sort some Danish text. Assume strxfrm() assigns numeric values as follows:
a
b
c
...
z
æ
ø
å
Given these assignments, suppose your program compares the strings præst and prøve. After running them through strxfrm(), they look like this:
p r æ s t
115 117 126 118 119
p r ø v e
115 117 127 121 104
When strcmp() looks at these transformed strings, it finds them equal up to the third letter, where it correctly determines that æ sorts before ø. 2
2. O'Donnell, Sandra Martin, Programming for the World: A Guide to Internationalization, p. 219.

Requirements for Compliance

Command Line Interface

Providers must supply a mechanism for specifying the locale in command line sort functions and these functions must be locale-sensitive, where relevant.
Consumers that parse or manipulate output from shell commands must not make any assumption about the order of that output, and must either provide a locale to the sort function, or determine the locale used in some manner. For example, when using the sort command to order a set of names, the output might not be in English alphabetical order; the order is determined by the locale of the environment. Parsing the sorted data yields different results in different locales, possibly producing an error.

Character Interface

Consumers should always manipulate ordered lists within a single text field component.

Graphical Interface

Providers must supply a mechanism for specifying the locale in the creation of ordered list elements, and must sort them according to the specified locale.
Consumers must ensure that any graphical components that deal with ordered lists can be dynamically rearranged as a result of user or program action. They must display sorted data according to the user's locale where relevant.

Application Protocols

Providers must include locale information in a protocol sorting request.
Consumers must provide locale information according to the protocol when requesting sorted information, or read the locale when receiving sorted information.

Storage and Interchange

Providers may use locale information for indexing stored data.
Consumers must supply locale information to provider storage which requires it for indexing.

Application Programming Interfaces

Providers must allow locale parameters in sort APIs and must sort data according to the given locale .
Consumers must include a locale when calling a sort API.
  « Previous | Contents | Next »
 
Related Links