Sun Java Solaris Communities My SDN Account Join SDN
 
Article

Solaris Developer Chat Sessions

 
 

Solaris Live Transcripts Index

October 18, 2001

Chat Title: Techniques for Optimizing Applications: High Performance Computing
Guest Speakers: Rajat P. Garg and Ilya Sharapov

This is a moderated chat.

adele: Hello, and welcome to Solaris Live! Our guests today are Rajat P. Garg and Ilya Sharapov, authors of "Techniques for Optimizing Applications: High Performance Computing". Rajat or Ilya, can you give us an introduction to your book and the topics it covers?

rajat: The book is primarily written for developers of computationally intensive programs who want to optimize their applications on Sun UltraSPARC systems. We cover a full spectrum of topics: from measuring performance to compiler optimizations, linking optimized libraries, source code modifications and extensive discussion on program parallelization methods.

test: What tools are discussed?

ilya: We have a few chapters on Sun compilers and discuss various features and options for serial optimization and parallelization. Special attention is payed to performance monitoring tools and profiling tools as they apply to serial and parallel programs.

test: Do you have examples of how optimization can be applied?

rajat: Yes, we have numerous examples in the book on how a particular optimization trick works. One of our objectives was to have a complete example program demonstrating a specific technique. For example, a program that shows how -xrestrict compiler optimization works -- before using this option and after using this option. The examples come with the book and can be downloaded from http://sun.com/blueprints/tools.

Joseph F. McGrath: In one example, the lines
A(I) = B(I) / C(I)
D(I) = E(I) / F(I)

inside of a loop are replaced by
TEMP = 1 / ( C(I) * F(I) )
A(I) = B(I) * F(I) * TEMP
D(I) = E(I) * C(I) * TEMP

Presumably, the additional multiplies are offset by the reduction from two divides to one divide. How does one search out savings of the sort shown above?

rajat: Joe, this is an example of strength reduction. It actually was motivated by a similar optimization in a spec benchmark. The floating point divides are typically more expensive than floating point multiplies and additions. The various tools that we discuss in the book will help identify such tuning opportunities. In particular, I would like to mention the analyzer tool. You can use this tool and, using the annotated disassembly/source feature, find the locations in the source where hotspots are. Identifying the hotspot is the tough part.

test: What optimizations at the source code level are discussed in your book?

ilya: We discuss general optimization techniques, such as optimization for memory hierarchies; for example, cache blocking or reducing cache conflicts, aliasing optimizations, and optimization related to data alignment. Special focus is on loop optimizations, such as loop tiling/unrolling, fusion/fission, peeling, etc.

oscar: Is the information in the book also useful for system administrators?

rajat: Oscar, Yes, there is information in the book that might be of use to system administrators. Specifically, we cover topics such as a description of features of our product lines (hardware, software), Solaris commands to identify system configuration (such as processor speed, size of memory, size of caches, solaris kernel settings, versions of compilers, HPC clustertools, etc. We do not cover detailed sys-admin topics such as networking setup, disk management etc. The primary target audience of the book is application developers.

Bill Walster: Hi Rajat and Ilya. Bill Walster here, so you know the kind of question I have. I have recently been playing with a little example that most programmers will think is obvious: Suppose I am computing the expression (1/a) + (1/b) There are two divides and one add. It seems obvious that the equivalent form (a + b)/(a*b) will be faster, because one divide is replaced by a multiply. The problem is that the latter expression produces a more accurate result than the former. To what extent in your book do you consider how different performance optimizations may impact result accuracy?

Bill Walster: P.S. to my question: I should have mentioned that there is a difference in accuracy when rounding errors are made in the process of performing the computation. If everything is machine representable, then there is no problem.

ilya: Hi Bill, Yes, we do address potential roundoff problems, as well as general issue of correctness of the results. We have a section on IEEE arithmetic and discuss where the problems can come from. For each of the optimization techiques (e.g. compiler optimizations, such as -fsimple, -xtrap, etc.), we discuss the impact on numerical values and the correctness of execution. A related issue is covered in one of the appendices: interval arithmetic. We explain how interval approach can help validate the results of computations as well as provide additional means of solving complex nonlinear problems.

test: I am porting a Fortran program from Cray computers (where default size of real and integer variables is 8-bytes). How can I promote real and integer variables to 8-bytes on the Sun platform without hand-changing the source?

rajat: Test: We have a compiler option that facilitates porting of code where the default sizes of basic data-types need to be changed. For your specific case, you can use f90 -xtypemap=real:64,integer:64 file.f90 option. This promotes real, integer variables in the program to be 64-bit (or 8-bytes). Note, this does not work for explicitly sized variables. For example: if you declare x,y,z as real*4 x,y,z then -xtypemap will not have any effect.

Bill Walster: Answer to "test". There is a comand line option for the compiler -xtypemap that enables one to set the default length for floating-point and integer variables. This has been introduced expressly to help people port Cray and CDC codes without having to change their code. One caution, however, the Fortran 95 standard is "picky" about literal constants. For example 0.1 in a double expression is treated as a single precision constant, rather than a double precision constant, which I believe is what most users want and expect. So, beware of the type of literal constants.

ilya: Bill, yes -xtypemap should be used with caution, and in some cases can cause problems. In C there is another related option: -xsfpconst, wich changes the default size of FP constants to single precision instead of default double. This can be very handy for single-precision C codes.

test: Half of the book covers parallelization; is this a manual for parallelizing applications?

ilya: It isn't really a manual; we mostly discuss optimization aspects of parallel programming assuming that the reader has some parallelization skills, or at least a good reference handy. But we do provide theoretical background and perspective for parallelization and parallel performance. We also talk about the usage of compilers and other tools for parallelizing applications and monitoring parallel performance.

Joseph F. McGrath: A generalization of mine on parallel processing has been questioned lately. In scientific and engineering applications, the approaches to parallelization on shared-memory and distributed-memory platforms have been narrowing down to OpenMP and MPI, respectively. That is, all other specifications and standards are falling by the wayside. Am I correct? Which others continue to be widely used? Which others show promise for the future?

rajat: Joe: Yes: we seem to be converging towards using MPI for message-passing programs and OpenMP for compiler-directive based parallelization on shared address space systems. The one other approach which still continues to be in use is explicit multithreading via the use of P-threads. This is primarily used in C (& C++) applications since the API's were developed for C language. Once some of the current deficiencies of the OpenMP standard are addressed (such as support for dynamic task creation/destruction, signal and exception handling), I think people will move their P-threads applications to OpenMP based applications. In the future, the approaches which provide explicit support to take advantage of non-uniformity of memory accesses in a shared address space cache-coherent system but maintain simplicity of programming will be the ones to become popular. In my opinion, OpenMP (with such extensions) is promising.

ilya: In case some of the participants plan to go to Supercomputing 2001, we'll have a copy of the book on display in the Sun booth there, as well as many experts who can talk about HPC on Sun platforms.

rajat: A question to participants: What are your impressions of the book? Are there any topics missing that should be covered or topics that we should have covered in more detail?

test: Why is the pointer alias analysis important in c programs and what options, if any, are available in Sun C compilers?

ilya: Answer to the pointer aliasing question: In C programs, the pointer variables can point to overlapping regions of memory leading to ambiguous data dependencies in the program. As a result, the compiler threats operations through potentially aliased questions conservatively. This may lead to suppressing many optimizations that otherwise could be performed and additional load and store instructions in the code. New Sun compilers include two options that can improve compiler's alias disambiguation analysis. -xrestrict options tell the compiler that function arguments point to non-overlapping memory regions. A more powerful option, -xalias_level, can be used to specify various levels of aliasing of user data types in the code.

Bill Walster: Do you have anything in your book about how to get performance from Java codes? There continues to be a lot of interest in Java and I know of at least one person who is working on techniques to get better performance from Java applications than one might think is possible.

rajat: Bill: The book is primarily written for tuning applications written in Fortran and C. Although a lot of techniques can be applied to C++ programs, we do not cover any optimization methods for Java programs.

test: Do you discuss different parallelization approaches?

rajat: Yes: we have chapters that discuss explicit multithreading (P-threads), compiler-directive (OpenMP) parallelization and message-passing (MPI). For each of these approaches, we discuss the advantages and disadvantages as well as specific programming models.

test: What do the terms spatial and temporal locality mean?

ilya: These terms refer to the way in which memory accesses take place in a program. Temporal locality implies that a data item used now is likely to be used again soon, while spatial locality implies that if a data item is referenced, then data in a neighboring location is also likely to be used in the computation.

adele: This is all we have time for today. Thank you very much for the participation from our audience. Rajat and Ilya, do you have any parting comments?

ilya: Thank you all for participation and for the questions! Further questions about our book or Sun products for HPC can be sent to blueprints@sun.com. We'll be glad to respond. Thanks again!

rajat: Thank you Adele for moderating the chat and to all the participants for asking insightful questions. We hope that you find the book useful and would very much like to get your feedback. Specifically, we would like to know what topics should be covered in additional detail and what topics (if any) do not belong at all in the book. Also, if you find any bug/errors in the code samples, we would appreciate hearing about it so we can correct them.

adele: Rajat and Ilya, thank you very much for being our guests today. Your book, "Techniques for Optimizing Applications: High Performance Computing" is reviewed on Solaris Developer Connection at http://soldc.sun.com. We wish you great success with it. The transcript for this chat will be available at http://soldc.sun.com/developer/chat/.

October 18, 2001


Back to Top