The SPARC architecture is a RISC processor that originally appeared in systems from Sun Microsystems in 1986 as the Sun-4/260. Since then, the processor has undergone many refinements to meet the changing needs of its customers. Today's SPARC processors are designed for both high performance and energy efficiency. This is achieved by incorporating multiple cores on a processor die and multiple processors within a system. These systems yield excellent results when applied to multiprocess and multithreaded jobs. The degree of parallelism and efficiency could not be achieved without atomic instructions. These instructions provide the basis for the synchronization required to achieve the high degree of parallelism associated with the Solaris Operating System (Solaris OS) and SPARC. This article briefly introduces the SPARC memory model and atomic instructions, then implements a number of IBM AIX interfaces for use on the Solaris OS. The article assumes that the reader is familiar with assembler language programming. This collection of examples serves to demonstrate the use of the SPARC atomic instructions and memory models, and it provides a small library that can be useful for the programmer who wants to port IBM AIX-based source code to the Solaris OS / SPARC platform. Memory Model
The SPARC Version 9 (SPARC v9) specification defines three memory models, from least restrictive to most restrictive:
The SPARC architecture provides multiple memory models for two reasons: so that implementations can schedule memory operations for higher performance, and so that programmers can create synchronization primitives using shared memory. The less-restrictive memory models -- RMO and PSO -- afford more opportunities for application performance improvements by the processor. An idealized SPARC v9 processor has the form shown in Figure 1.
The processor's issue unit reads instructions from memory and issues them in program order. Program order is the order determined by the control flow of the application under the assumption that each instruction executes independently and sequentially. The reorder unit gathers those issued instructions for dispatch to the execute unit. The reorder unit allows an implementation to rearrange instructions to perform some instructions in parallel for greater efficiency.The reordering is constrained to maintain program order. The execute unit carries out the instruction and writes results to a buffer unit. The buffer unit schedules write operations to memory. The presence of the buffer unit frees the execute unit from delays incurred by writing to memory. The buffer unit can also respond to load requests for a memory address when it holds the contents of the address from a previous store. This introduces a potential for inconsistency -- a write request to memory can be present in the buffer unit, reloaded by the issue unit from the buffer unit, modified by the execute unit again and the write request enqueued yet again before a write to memory by the buffer unit. Although this can in theory produce an inconsistency between processes in a single-processor system, this inconsistency does not in fact occur, because the actions of a process-context switch include a buffer unit flush to memory. In a multiprocessor system, an inconsistency can occur as a result of the presence of the buffer unit. The inconsistency will arise when memory shared between processes is modified by their respective processors but unwritten to memory -- that is, when the modified values are present in the buffer unit of more than one processor. Figure 2 illustrates a multiprocessor system.
Membar Instruction
The SPARC v9 architecture includes the
The ordering
The semantics of each The SPARC Version 8 (SPARC v8) Depending on the memory model under which the system is operating, the programmer must explicitly insert memory-ordering instructions to guarantee program correctness. For example, the SPARC assembler code to release a lock using the store unsigned byte (
Remember that a lock protects some variable that will be modified after the lock is acquired and before the lock is released. The modification of the variable implies a store to memory by the program. In PSO mode, the Those who follow the defensive school of programming will develop code assuming the least restrictive model -- RMO -- because this will execute correctly when using the other two SPARC memory models. The defensive school would also gather all synchronization primitives into a common system library. Atomic Instructions
SPARC machine instructions are normally executed to completion without interruption. This includes the memory access instructions of load and store. In multiprocessor-multicore systems, two or more processes executing an instruction using the same memory address are guaranteed to occur in a serial but undefined order. This guarantees memory consistency, but the order of operations is undefined, as are the memory contents after the operations complete. Atomic instructions act like both a load and a store, extending the "without interruption" requirement to include both operations. These instructions allow the creation of multithreaded and cooperating multiprocess applications that take advantage of the concurrency offered by today's high-performance systems. SPARC Version 9 (v9) has three atomic instructions: Load-Store Unsigned Byte:
ldstubThe load-store unsigned byte ( Here is the algorithm for the instruction, presented in a C-like pseudocode:
The Swap Register With Memory:
swapThe swap register with memory ( The algorithm is as follows:
Similar to the Compare and Swap:
casThe SPARC v9 manual introduced the newest atomic instruction: compare and swap ( The instruction has an infinite consensus number: It can resolve an infinite number of contending processes in a wait-free fashion. You can use Here is the algorithm for
To determine whether a swap took place, compare the return value in the second register with the test value used in the first register. Here is what this looks like in this article's pseudocode:
You can use a simple performance enhancement when you write this in SPARC assembler. Solaris OS Interfaces
The Solaris OS has many interfaces for use in concurrent and multiprocessor systems. These Solaris thread interfaces date from at least Solaris 2.4 (March 1993). The POSIX thread interfaces were added in Solaris 2.5 (June 1995). These are often referred to as the Pthread interfaces and originally appeared in the 3T manual section. In Solaris 10, both Solaris and POSIX thread interfaces are documented in the Multithreaded Programming Guide. The Pthread interfaces have been extended over the years to remain POSIX compliant and to add functionality that is not part of the POSIX standard. The thread library has been rewritten at least once. This rewrite harmonized the Solaris OS and POSIX interfaces, integrated the interfaces into Most vendors have adapted the POSIX thread interfaces, so porting applications between vendor platforms is largely a matter of recompiling for any thread interfaces. However, a simple recompile will not solve one class of problems. If the application used synchronization primitives that were vendor specific, then porting the application requires more than a recompile. In some cases, this may be trivial. For example, a simple interface mapping can be made from Solaris OS to POSIX thread interfaces with almost perfect fidelity. But once again, some interfaces will not be trivial to implement. IBM AIX Interfaces
The IBM AIX platform has offered a few vendor-specific synchronization primitives since version 3.2 was released in 1992. The paper "Turning the AIX Operating System Into an MP-Capable OS" by Jacques Talbot provides a good background on the PowerPC processor and synchronization under the AIX operating system. The paper is available from the USENIX 1995 Technical Conference Proceedings under the Potpourri II section. The synchronization primitives in question are the following:
Knowing the SPARC atomic instructions and a little SPARC assembler, you can implement the seven AIX synchronization primitives for the Solaris OS. This should result in a compatibility library for porting AIX applications to the Solaris platform. The library will be implemented for Solaris / SPARC v9 platforms running in 32-bit mode. The programmer community will have to provide SPARC 64-bit, AMD, and Intel implementations. Each of the interfaces in this section will be presented in a C-like pseudocode and implemented in SPARC assembler. All of the interfaces are leaf routines and as such can take advantage of the leaf procedure optimizations as described in sections D.5 and H.1.2 of the SPARC Architecture Manual v8 (PDF). The
fetch_and_add, fetch_and_or, and fetch_and_and InterfacesThe Here is the algorithm:
In SPARC assembler, this becomes the following:
All that remains to complete the three interfaces is the substitution of Note that two conditional branch features are used: annul and predictive. The annul " Also, a performance enhancement is made possible by the return value of the The
_clear_lock and _check_lock InterfacesThese two interfaces update a lock word in an atomic manner. The The algorithms are as follows:
Because each of these interfaces is intended to be used for synchronization, a memory barrier will be required. Here is the SPARC assembler for them:
Of note in the The
compare_and_swap InterfaceThe
The SPARC assembler for the interface is a straightforward mapping to the
The only item of note is the return handling, which is written using a conditional branch with the annul bit. If the swap took place, the branch will be taken and the The
test_and_set InterfaceThis interface is a bitwise test-and-set operation. A bitwise
The implementation will use a loop, but not because the interface is required to succeed before returning -- that is not a requirement. The loop is required because of a race-condition between the
The preceding form assumes the existence of a
And the function is atomic by definition. This second form of
Conclusion
This article has provided an overview of the SPARC v9 processor memory model and atomic instructions as they pertain to multiprocessor systems and shared memory applications. It also provided a Solaris OS implementation of several IBM AIX interfaces that you can use to aid in porting AIX-based applications to the Solaris OS. Acknowledgments
This article could not have been written without the help of java.sun.com manager Jill Welch and managing editor Christine Dorffi. For More Information
| |||||||||||||||||||||||||||||||||
Oracle is reviewing the Sun product roadmap and will provide guidance to customers in accordance with Oracle's standard product communication policies. Any resulting features and timing of release of such features as determined by Oracle's review of roadmaps, are at the sole discretion of Oracle. All product roadmap information, whether communicated by Sun Microsystems or by Oracle, does not represent a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. It is intended for information purposes only, and may not be incorporated into any contract.
|
| ||||||||||||