Sun Java Solaris Communities My SDN Account Join SDN
 
Article

Best Practices - Part 3. Best Practices for Application Load Testing and Profiling for Performance

Contents
 
Load Testing
Load-Balancing, Configurations, and Performance Implications
Failover and Scalability
Load-Testing Tools
Profiling
Resources
 
Load testing for performance is an important part of an application's development cycle; profiling is necessary to ensure that the Java 2 Platform, Enterprise Edition (J2EE platform) application performs optimally under load. This article discusses performance implications of various load-balancing and high-availability configurations that can be set up on Sun ONE Application Server (formerly iPlanet Applicaton Server). It also identifies some tools that can help in the load-testing process, and tools that assist in obtaining profile data.
 
Load Testing
Load testing is an essential, but often neglected, part of an application development cycle. It helps you understand how your software scales and helps you investigate which components or tiers in your application need to be tuned or profiled for performance bottlenecks; this assistance helps you get more out of every processor on which your application runs. The sizing information also helps customers understand the hardware requirements for the load they expect an application to service. Moreover, load-related bugs such as memory leaks may not show up during regular functional testing if adequate attention is not paid to them.
 

There are at least two types of load testing:

  • Stress testing
  • Real-world simulation
Both of these assume knowledge of what actions the user of the application performs with the application. Stress testing is the actual simulation of users interacting with the application without any think times. It is important for analyzing performance bottlenecks and for setting limits to the amount of load your hardware or network can take at maximum. In a real-world simulation, measuring the number of concurrent users is the goal. All users of the application are not active simultaneously, and they have varied think times. The main use of virtual client simulation is to furnish sizing information for customers who want to buy the application.
 

A typical J2EE software environment with Sun ONE Application Server has the following tiers:

  • Load balancer: The load balancer routes a client request to a suitable Web connector for Sun ONE Application Server, based on a configured load-balancing policy.

  • Web connector: The Web connector machine in turn sends the request to a Sun ONE Application Server instance. This behavior can be configured in many ways. For example,

    • A load-balancing policy, based on a round-robin scheme or on response time (with variations), can be executed.
    • The Web connector can forward the request to the iPlanet Application Server instance with which it is associated.

    Note: The Web connector is the point of entry for Web-based requests into the Application Server. This means that it will also be the single point of failure that could bring down your system, unless your configuration is set up to use a third-party load balancer--for example, Cisco's Local Director, Zeus, or Resonate.

  • Sun ONE Application Server instance: Sun ONE Application Server has a database back end or legacy system connectivity.

  • Database or legacy system: Preliminary load testing with system monitoring will help locate which tier is the bottleneck. The result can be the target of further profiling or tuning.
Load-Balancing, Configurations, and Performance Implications

Sun ONE Application Server has many load-balancing policies, the setup of which may affect the behavior of an application under load, which in turn affects performance. Sun ONE Application Server offers the following load-balancing policies:

  • Round-robin scheme
  • Response time
  • Load balancing based on Sun ONE Application Server instance (Executive Server or KXS)
  • Sticky load balancing
Round-Robin Policy
Round-robin load balancing is handled at the Web connector level, with two variations: a simple load-balancing scheme with consecutive requests being routed to different application server instances; or a weighted round-robin load-balancing configuration, where server weights determine how many consecutive requests an application server instance receives before the next instance in the Web connector's list gets requests. The next instance gets requests, depending on its weight as known to the Web connector, and the Web connector then moves on to the next instance in its list.
 
The weighted round-robin configuration is useful when the Sun ONE Application Server instances are on different hardware configurations, and consequently some instances can handle more load than others.
 
Round-robin load balancing, however, is static and does not take into account response time under load. The advantage is that the policy has a relatively lightweight implementation because of its static nature.
 
Response-Time Policy
Just as in round-robin load balancing, there are two variations to response-time-based load balancing: load balancing based on server response time; and load balancing based on component response time. Both types of load balancing are handled by the Web connector.
 
When load balancing is based on server response time, the Web connector sends the first 128 requests to a Sun ONE Application Server instance and sends the next 128 to another Sun ONE Application Server instance, and so on, until it has exhausted the list of application server instances. Based on the average response time of each server, the Web connector sends all subsequent requests to the server with the best response time. The Web connector maintains an average response time, calculated during a time window, for the server to which it sends requests, and it starts sending requests to the next-best server (as it previously calculated) if the server currently servicing requests does not provide the best response time.
 
Server-response-time load balancing may not be optimal in situations where different components in an application may have different response times on different instances; some instances may have a "good" response time for some components, and those components may have a substantially higher response time on other instances. An assumption made in this policy is that the first 128 requests are a good representative sample of all types of client requests that the deployed application would receive.
 
When load balancing is based on component response time (the default configuration), the first problem is remedied, and the Web connector maintains statistics about which instance is the "best" to service a given component.
 
Response-time load balancing, however, has a higher performance overhead than that of round-robin load balancing. This is mainly due to the process of collecting and maintaining statistics about the most preferred server.
 
Sun ONE Application Server Policy
Load balancing through Sun ONE Application Server instances is handled by the Web connector and the Sun ONE Application Server instance together. This method is deprecated.
 
The Web connector sends a request to a default instance, and the KXS process in that instance maintains statistics about its performance and the performance of other Sun ONE Application Server instances. The KXS sends the request back to the Web connector (marked with information about which server to redirect the request to) if it does not have the best response time, and the Web connector reroutes the request. The same cycle repeats for the second instance, if it turns out not to be the best performer, until a predefined hop-count is reached, after which the instance that has the request will process it. As you can judge, load balancing based on Sun ONE Application Server can present a lot of overhead, and this may adversely affect performance.
 
Sticky Load Balancing
Sticky load balancing can be a performance enhancement option for applications that may have large sessions. Marking such sessions as distributed can have a performance overhead. Sticky load balancing ensures that a KJS that processes a request for a component marked sticky will be the one to process further requests for the same components and other components marked sticky. This processing method can improve performance since it avoids session serialization and improves performance for result caching.
 
Failover and Scalability
A factor that affects the performance of an application is its failover requirement. Sun ONE Application Server clusters provide failover support, and Distributed Data Synchronization (DSync) is used to achieve session and state failover. Failover support, for maintaining 24/7 uptime, and session availability or redundancy take up CPU cycles because they are limited to just processing user requests. The performance overhead depends on an application's failover requirement. The advantages and disadvantages of the two types of session management--lite sessions and distributed sessions--are discussed below.
 
Lite Sessions
Lite sessions are stored at the KJS level and are not replicated or made available to other containers (KJSes). Lite sessions are easy to use and have the least performance overhead associated with them, but session data can be lost during failover to another KJS if you have sharable session data and have not activated sticky load balancing.
 
Distributed Sessions
When distributed sessions are used, session data is stored in a KXS and can be made available to any KJS within a particular Sun ONE Application Server instance. Thus, session information is not lost if a KJS goes down. This approach, however, introduces the overhead of transferring a session between the KJS and KXS. The performance overhead mainly depends on the size of the session that needs to be serialized.
 

Distributed sessions are of two types:

  • DSync-local: In a DSync-local session, a KXS shares data with KJSes that belong to the same instance.

  • DSync-distributed: In a clustered setup, DSync-distributed is used and the KXS hosting session data shares data with other instances.
Distributed sessions also impose the additional overhead of backing the data to a backup instance(s). This backup can be configured according to the deployment site's needs.
 
Load-Testing Tools
Commercially available load-testing software makes stress testing and real-world simulation easier. The advantages of using these tools over homemade load generators are manifold. They provide detailed data of many performance metrics (throughput, response time, server statistics, and others) and also can plot them on charts. The languages in which load-testing scripts must be written are relatively simple, and wizards automate the scripting if one does not want to learn the language. Testing parameters (HTTP version, connection keep-alives, and so forth) are easily configurable, as is the selection of test completion criteria. Using a commercially available load tester also helps if the results are to be published outside the organization.
 
Profiling
The answer to the question "What do I do when my load testing gives me performance numbers that are not good enough?" lies in profiling. The aim of profiling is to yield a better-tuned application that will perform better; that is, an application that will utilize its computational resources optimally and maximally and scale perfectly. But usually, business reasons define the outcome; it is sufficient if the application can handle as much load as the customer requires, or at least as much as its competitors.
 

Two steps lead to better performance through profiling; they must be performed iteratively:

  • Load testing
  • Profiling (includes tuning, based on information obtained during profiling)

After you have decided that your application performance is not as expected, profiling can elicit clues about performance problems. Some guidelines for profiling are:

  • Decide which tier of your application to profile.
  • Use a profiling tool to obtain data in the bottlenecking tier.
  • Tune, in accordance with data obtained.
Commercial tools exist for profiling--Borland Optimizeit Profiler and Wily Technology's Introscope are examples.
 
Deciding Which Tier to Profile
You can usually decide which tier to profile by monitoring the system under test at the operating-system level. OS tools such as mpstat and prstat can provide information about how loaded the system is, which processes are taking CPU slices, what the process size is, and so forth. Mutex contentions, context switches, and high wait times all indicate tuning possibilities. The proc tools (in /usr/proc/bin on machines using the Solaris 8 Operating Environment) also help with other information on processes. System-level tuning includes OS tunables in /etc/system and through the ndd command. Many interesting Java Virtual Machine tunables are the -X and -XX options to the command in Sun ONE Application Server, and these must be specified in JAVA_ARGS in the file ias-install-directory/ias/env/iasenv.ksh.
 
The value of adjusting these tunables depends on the application, so the sequence would be to profile first; tune the application; and then tune the application server, virtual machine for the Java platform (JVM) software, and OS. Inserting timers into your application may also help you spot initial bottlenecks. At the application-server level, the kxs logs are a good place to look for information about how much time it took to process a request. You can enable such logging on the Web connector by setting to 1 the value of the registry key SOFTWARE\iPlanet\Application Server\6.0\CCS0\HTTPAPI\NASRespTime. Note that the uncertainty rule applies and that measurements are costly. The measurements may be skewed by the presence of the logger itself.
 
Using a Profiling Tool to Obtain Data
The Profiler analysis helps you spot performance bottlenecks. Execution times of different methods, monitor contentions, and object allocations are important figures to look at during profiling. The aim of profiling is to make sure that relatively high amounts of time are not being spent in a certain body of code. Common issues uncovered during profiling are heavy String manipulation, file reads that do one byte at a time instead of buffered reads, and extensive file I/O.
 
Knowledge of times spent in different methods may require rewriting a method so as to use a different algorithm if the bottleneck is not in a data structure but is an inefficient algorithm. Adherence to good Java programming practices also helps. The JDK 1.3.1 version shipped with Sun ONE Application Server 6.5 implements the Java Virtual Machine Profiler Interface (JVMPI).
 
HPROF, a profiler shipped with the JDK software, generates helpful profiling information. One way to enable HPROF is to add to JAVA_ARGS in your iasenv.ksh -Xrunhprof:cpu=samples,file=/tmp/my_profile.out. There are many other ways to configure HPROF, so look up the documentation for HPROF.
 
The following is a sample dynamic stack trace from HPROF.
 
TRACE 116: 
        java.lang.Class.forName0(Class.java:Native method)
        java.lang.Class.forName(Class.java:120)
        com.sun.corba.ee.internal.corba.ServerDelegate.class$(ServerDelegate.java:83)
        com.sun.corba.ee.internal.corba.ServerDelegate.getClientSubcontractClass
          (ServerDelegate.java:126)
TRACE 240:
        com.sun.corba.ee.internal.POA.POACurrent.peekThrowNoContext(POACurrent.java:169)
        com.sun.corba.ee.internal.POA.POACurrent.get_object_id(POACurrent.java:64)
        com.sun.corba.ee.internal.POA.DelegateImpl.this_object(DelegateImpl.java:36)
        org.omg.PortableServer.Servant._this_object(Servant.java:64)
TRACE 28: 
        com.kivasoft.gds.GDSKey.getValuenative(:Native method)
        com.kivasoft.gds.GDSKey.getValue(:Unknown line)
        com.kivasoft.util.GX.GDSGetKeyValueString(:Unknown line)
        com.kivasoft.util.GX.GDSCurrentGetString(:Unknown line)
TRACE 11: 
        java.util.jar.Attributes.putValue(Attributes.java:147)
        java.util.jar.Attributes.read(Attributes.java:365)
        java.util.jar.Manifest.read(Manifest.java:202)
        java.util.jar.Manifest.(Manifest.java:56)
 
The trace numbers are important for correlating a stack with a certain heap profile or a CPU usage profile. The part of the profile that shows CPU usage for the same application is shown next.
 
CPU SAMPLES BEGIN (total = 330) Fri Feb 8 15:29:48 2002 
rank  self  accum count trace method 
 1  76.97% 76.97%  876   41   com.kivasoft.thread.ThreadBasic.run 
 2  11.52% 88.48%  108  277   com.kivasoft.lcycmgr.LifeCycleMgr.waitForStoppedStatenative 
 3   1.52% 90.00%   12  493   com.kivasoft.thread.ThreadBasic.run 
 4   0.91% 90.91%    5  534   com.kivasoft.types.COM.COMClear 
 5   0.91% 91.82%    7  494   com.kivasoft.types.COM.COMClear 
 6   0.91% 92.73%    5  516   com.kivasoft.util.ValList.getValStringnative 
 7   0.91% 93.64%    6  500   java.net.SocketInputStream.socketRead 
 8   0.61% 94.24%    3  530   java.lang.Object.wait 
 9   0.61% 94.85%    3  524   com.kivasoft.types.COM.COMClear 
10   0.61% 95.45%   10  495   com.kivasoft.util.ValList.getValStringnative 
11   0.30% 95.76%    2  512   java.lang.ClassLoader.defineClass0 
12   0.30% 96.06%    3  506   com.kivasoft.types.COM.COMClear
13   0.30% 96.36%    5  319   java.lang.Class.newInstance0
14   0.30% 96.67%    1  547   oracle.jdbc.driver.OracleStatement.clearDefines 
15   0.30% 96.97%    2  499   java.lang.Object.notify 
16   0.30% 97.27%    1  548   java.lang.Thread.currentThread 
17   0.30% 97.58%    1  542   com.kivasoft.util.Stream.flushnative
18   0.30% 97.88%    4  497   java.lang.Throwable.fillInStackTrace 
19   0.30% 98.18%    1  544   java.lang.Class.newInstance0 
20   0.30% 98.48%    1  545   oracle.jdbc.driver.OracleStatement.doDefaultTypes 
21   0.30% 98.79%    2  507   sun.misc.URLClassPath.getLoader 
22   0.30% 99.09%    1  543   java.net.URLClassLoader.findResource 
23   0.30% 99.39%    2  279   com.kivasoft.bind.BinderServlet.bind
24   0.30% 99.70%    1  546   java.lang.Thread.currentThread 
25   0.30% 100.00%   2  533   sun.misc.URLClassPath.getLoader 
CPU SAMPLES END 
 
Again, note that the trace column in the preceding output can be used to connect this output to a stack trace.
 
Tuning Based on Data Obtained
CPU hot spots and excessive object creations are important candidates for streamlining. The processor usage profile can report the time spent in different methods; look for ways to minimize time that looks relatively high. Cumulative time spent in a certain method and all the methods it calls are important pieces of information. After some iterations, the profile should look tuned.
 
System-level indications of a tuned application under load are a high percentage of time in user mode, relatively low system time (absolute times depend on the application), low mutex contention, and other indicators. Continue the number of iterations of profiling, tuning, and load testing until the desired performance is reached. If performance goals are not realized with basic tuning, a rearchitecture of the software may be necessary.
 
Summary
Profiling and load testing are important parts of the application development cycle that should not be neglected.
 
Resources

For more about tools and other sources of information referred to here, consult the following resources: