Profile feedback is
a useful mechanism for providing the compiler with information about
how a code behaves at runtime. Having this information can lead to
significant improvements in the performance of the application. As
with all optimisations, it is only worth using profile feedback if it
does produce a gain in performance.
Some degree of care is required in selecting representative workloads for providing training data to the compiler. The representativeness of the workloads can be examined by comparing the profiles gathered by tools such as tcov or the Performance Analyzer. IntroductionWhen an application is compiled, the compiler will do its best to select the optimal instructions to use and the optimal layout for the code. It has to make decisions based on the source code, but the source code contains no information about the dynamic behaviour of the code; so the compiler has to use heuristics to provide a 'best-guess'. The heuristics are used to determine how to structure the code, which routines should be inlined, which bits of code are executed frequently, and many other details. An example of one of the problems that the compiler faces is the code shown in Example 1:
Code Example 1 In the code shown in Example 1 the compiler has an interesting decision to make; there are different ways of structuring the code. Should the compiler make A or B the default (and therefore make that path faster), or should it structure the code so that both branches of the code have equal performance.? The question of how best to arrange if statements is one of the many decisions affecting code layout that the compiler has to make. Examples of other decisions are:
Most of these decisions can only be governed by heuristics since the compiler has no information about what happens to the program at runtime. However, there is a mechanism, called profile feedback, which enables the compiler to gather information about what happens to the program at runtime using data from a run of a representative 'training' dataset. Using profile feedback on the benchmark suite SPEC CPU2000 leads to an average of about 7% performance gain for the floating point suite and a 16% performance gain for the integer suite. For individual codes within the suite, the performance gains vary from no gain for some codes, to significant gains for others. Building with profile feedbackThe idea with profile feedback is to run the program for a short time and gather data about what happens to the program during that run. The compiler then uses that data to refine its optimisation decisions. The process of using profile feedback is:
This means that, using profile feedback, build process will take about twice as long compared building without it. This is because the build involves two passes through the compiler, plus a short run of the application. It is therefore important that the gains in performance seen at runtime are worth the extra build complexity. Selecting a representative workloadBuilding with profile feedback requires that a 'representative' training workload is used to inform the compiler about the runtime behaviour of the application. The key points about this workload are:
A concern that is sometimes raised is whether using the wrong training workload might lead to worse performance for some cases. This is possible, but it typically comes about for one of two reasons:
In both cases it may be that adding another training workload will improve the performance for the problem workload. It is also worth looking at the code coverage or time spent in the various routines so that the reason for the difference in performance can be identified. It is rare that training for one workload will force another workload to run slower. It is more likely that the training data has indicated to the compiler that a particular optimisation is unnecessary, and using additional training data which provides evidence that the optimisation is necessary will improve performance for the problem workload whilst not impacting performance for other workloads. The benefits of profile feedbackThe more information that the compiler has, the better job it can do at optimising the application. As with all optimisations, some codes will greatly benefit, other codes will see no gains. It is strongly dependant on the type of code. The type of code that is likely to benefit from profile feedback is code which has a large number of conditional statements (if statements), the largest benefit will be had by codes which have very predictable behaviour, but the behaviour is not obvious to the compiler. A simple example of this kind of code is where there are checks for correct values. The compiler cannot easily determine whether the programmer expects the checks to pass or fail, so will typically make the null assumption that passing and failing are equally likely. However, if the test is for 'valid data', most of the time the values in the code are valid, then profile feedback will enable the compiler to identify this, and optimise the code appropriately. Another situation where profile feedback can lead to performance gains is when the profile can be used to select the best set of routines for the compiler to inline. There are two benefits from inlining, the first is to eliminate the cost of the call to the routine, the second is to expose further opportunities for optimisation. The downside of inlining is that it can lead to an increase in code size; if the inlined code turns out not to be useful, then this increase in code size may actually reduce performance. Profile feedback enables the compiler to correctly select the routines which are frequently called and are therefore candidates for inlining, whilst rejecting routines which are rarely called. The profile feedback compiler flagsThe flag that tells the compiler to either build the application and collect a profile, or build the application and use an existing profile is -xprofile. The use of the flag has some subtleties which require a bit more explanation.
Specifying other compiler flags with profile feedbackWhen the application is compiled with -xprofile=collect, to collect profile information, the binary is produced with a lower level of optimisation than would otherwise occur – this is so that the data gathered is more detailed than the data that would be gathered using an optimised binary. The instrumented binary produced will have a particular layout of the code depending on both the source code, and the flags used to build it. If the flags are changed, the layout of the code may change.
Running the executable to collect profile informationWhen the executable is run, the profile data is written into the file system. The write takes place at the end of the run, so if the application fails to run to completion, then there may well be no profile data written. If the application is run multiple times, then the profile data accumulates the results from all the runs. If the source code is modified, it is not a good idea to reuse old profile data. It may happen that the compiler does not complain or report an error, but it is unlikely that the compiler is taking the optimal decisions.
Compiler options that use data collected by profile feedbackThere are several compiler options which use profile feedback information:
Example code using profile feedbackThe code shown in Example 2 has opportunities for improvement to code layout from profile feedback. From inspection of the code it is obvious that the time is spent calling function f. This function sums up the six values passed into it, but before performing the sum it checks that each of the pointers to the values is valid. In the example, all the values are valid, and for most checks of this kind found in programs, it is usual for the data to be valid. However, the compiler cannot identify that the tests will usually be valid, so has to make the assumption that both of the two conditions in the if statement are equally likely.
Code Example 2 - Demonstrating performance gains with profile feedback Example 3 shows the results of compiling and running this program without profile feedback.
Code Example 3 - Compiling and running without profile feedback Example 4 shows the process of compiling this code with profile feedback, notice that there is a training run of the program using far fewer iterations of the main loop.
Code Example 4 - Compiling and running with profile feedback The 10 second difference in runtime between the two codes represents about a 25% improvement. Obviously this particular example has been put together to demonstrate profile feedback optimisations, but the principles that it shows appear in most codes. About the AuthorsDarryl Gove is a senior staff engineer in Compiler Performance Engineering at Sun Microsystems Inc., analyzing and optimizing the performance of applications on current and future UltraSPARC systems. Darryl has an M.Sc. and Ph.D. in Operational Research from the University of Southampton in the UK. Before joining Sun, Darryl held various software architecture and development roles in the UK. Chris Aoki is an engineer in Sun's SPARC compiler backend team. He has worked on code generation and optimization in several generations of compiler technology at Sun. His current projects primarily involve compiler and runtime support for feedback based optimization. (Page last updated September 7, 2005) | ||||||||||||||||||||||||||||||||
Oracle is reviewing the Sun product roadmap and will provide guidance to customers in accordance with Oracle's standard product communication policies. Any resulting features and timing of release of such features as determined by Oracle's review of roadmaps, are at the sole discretion of Oracle. All product roadmap information, whether communicated by Sun Microsystems or by Oracle, does not represent a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. It is intended for information purposes only, and may not be incorporated into any contract.
|
| ||||||||||||