|
Contents
Load testing for performance is an important part of an application's development cycle; profiling is necessary to ensure that the Java 2 Platform, Enterprise Edition (J2EE platform) application
performs optimally under load. This article discusses performance implications of various load-balancing and high-availability configurations that can be set up on Sun ONE Application Server (formerly iPlanet Applicaton Server). It also identifies some tools that can help in the load-testing process, and tools that assist in obtaining profile data.
Load Testing
Load testing is an essential, but often neglected, part of an application development cycle. It helps you understand how your software scales and helps you investigate which components or tiers in your application need to be tuned or profiled for performance bottlenecks; this assistance helps you get more out of every processor on which your application runs. The sizing information also helps customers understand the hardware requirements for the load they expect an application to service. Moreover, load-related bugs such as memory leaks may not show up during regular functional testing if adequate attention is not paid to them.
There are at least two types of load testing:
Both of these assume knowledge of what actions the user of the application performs with the application. Stress testing is the actual simulation of users interacting with the application without any think times. It is important for analyzing performance bottlenecks and for setting limits to the amount of load your hardware or network can take at maximum. In a real-world simulation, measuring the number of
concurrent users is the goal. All users of the application are not active simultaneously, and they have varied think times. The main use of virtual client simulation is to furnish sizing information for customers who want to buy the application.
A typical J2EE software environment with Sun ONE Application Server has the following tiers:
Load-Balancing, Configurations, and Performance Implications
Sun ONE Application Server has many load-balancing policies, the setup of which may affect the behavior of an application under load, which in turn affects performance. Sun ONE Application Server offers the following load-balancing policies:
Round-Robin Policy
Round-robin load balancing is handled at the Web connector level, with two variations: a simple
load-balancing scheme with consecutive requests being routed to different application server instances; or a weighted round-robin load-balancing configuration, where server weights determine how many consecutive requests an application server instance receives before the next instance in the Web connector's list gets
requests. The next instance gets requests, depending on its weight as known to the Web connector, and the Web connector then moves on to the next instance in its list.
The weighted round-robin configuration is useful when the Sun ONE Application Server instances are on different hardware configurations, and consequently some instances can handle more load than others.
Round-robin load balancing, however, is static and does not take into account response time under load. The advantage is that the policy has a relatively lightweight implementation because of its static nature.
Response-Time Policy
Just as in round-robin load balancing, there are two variations to response-time-based load balancing: load balancing based on server response time; and load balancing based on component response time. Both types of load balancing are handled by the Web connector.
When load balancing is based on server response time, the Web connector sends the first 128 requests to a Sun ONE Application Server instance and sends the next 128 to another Sun ONE Application Server instance, and so on, until it has exhausted the list of application server instances. Based on the average response time of each server, the Web connector sends all subsequent requests to the server with the best response time. The Web connector maintains an average response time, calculated during a time
window, for the server to which it sends requests, and it starts sending requests to the next-best server (as it previously calculated) if the server currently servicing requests does not provide the best response time.
Server-response-time load balancing may not be optimal in situations where different components in an application may have different response times on different instances; some instances may have a "good" response time for some components, and those components may have a substantially higher response time on other instances. An assumption made in this policy is that the first 128 requests are a good representative sample of all types of client requests that the deployed application would receive.
When load balancing is based on component response time (the default configuration), the first problem is remedied, and the Web connector maintains statistics about which instance is the "best" to service a given component.
Response-time load balancing, however, has a higher performance overhead than that of round-robin load balancing. This is mainly due to the process of collecting and maintaining statistics about the most preferred server.
Sun ONE Application Server Policy
Load balancing through Sun ONE Application Server instances is handled by the Web connector and the
Sun ONE Application Server instance together. This method is deprecated.
The Web connector sends a request to a default instance, and the KXS process in that instance maintains statistics about its performance and the performance of other Sun ONE Application Server instances. The KXS sends the request back to the Web connector (marked with information about which server to redirect the request to) if it does not have the best response time, and the Web connector reroutes the request. The same cycle repeats for the second instance, if it turns out not to be the best performer, until a
predefined hop-count is reached, after which the instance that has the request will process it.
As you can judge, load balancing based on Sun ONE Application Server can present a lot of overhead,
and this may adversely affect performance.
Sticky Load Balancing
Sticky load balancing can be a performance enhancement option for applications that may have large sessions. Marking such sessions as distributed can have a performance overhead. Sticky load balancing ensures that a KJS that processes a request for a component marked sticky will be the one to process further requests for the same components and other components marked sticky. This processing method can improve performance since it avoids session serialization and improves performance for result caching.
Failover and Scalability
A factor that affects the performance of an application is its failover requirement. Sun ONE Application Server clusters provide failover support, and Distributed Data Synchronization (DSync) is used to achieve session and state failover. Failover support, for maintaining 24/7 uptime, and session availability or redundancy take up CPU cycles because they are limited to just processing user requests. The performance
overhead depends on an application's failover requirement. The advantages and disadvantages of the two types of session management--lite sessions and distributed sessions--are discussed below.
Lite Sessions
Lite sessions are stored at the KJS level and are not replicated or made available to other containers (KJSes). Lite sessions are easy to use and have the least performance overhead associated with them, but session data can be lost during failover to another KJS if you have sharable session data and have not activated sticky load balancing.
Distributed Sessions
When distributed sessions are used, session data is stored in a KXS and can be made available to any KJS within a particular Sun ONE Application Server instance. Thus, session information is not lost if a KJS goes down. This approach, however, introduces the overhead of transferring a session between the KJS and KXS. The performance overhead mainly depends on the size of the session that needs to be serialized.
Distributed sessions are of two types:
Distributed sessions also impose the additional overhead of backing the data to a backup instance(s). This backup can be configured according to the deployment site's needs.
Load-Testing Tools
Commercially available load-testing software makes stress testing and real-world simulation easier. The advantages of using these tools over homemade load generators are manifold. They provide detailed data of many performance metrics (throughput, response time, server statistics,
and others) and also can plot them on charts. The languages in which load-testing scripts must be written are relatively simple, and wizards automate the scripting if one does not want to learn the language. Testing parameters (HTTP version, connection keep-alives, and so forth) are easily configurable, as is the selection of test completion criteria. Using a commercially available load tester also helps if the results are to be published outside the organization.
Profiling
The answer to the question "What do I do when my load testing gives me performance numbers that are not good enough?" lies in profiling. The aim of profiling is to yield a better-tuned application that will perform better; that is, an application that will utilize its computational resources optimally and maximally and scale perfectly. But usually, business reasons define the outcome; it is sufficient if the application can handle as much load as the customer requires, or at least as much as its competitors.
Two steps lead to better performance through profiling; they must be performed iteratively:
After you have decided that your application performance is not as expected, profiling can elicit clues about performance problems. Some guidelines for profiling are:
Commercial tools exist for profiling--Borland Optimizeit Profiler and Wily Technology's Introscope are examples.
Deciding Which Tier to Profile
You can usually decide which tier to profile by monitoring the system under test at the operating-system level. OS tools such as
mpstat and prstat can provide information about how loaded the system is, which processes are taking CPU slices, what the process size is, and so forth. Mutex contentions, context switches, and high wait
times all indicate tuning possibilities. The proc tools (in /usr/proc/bin on machines using the Solaris 8 Operating Environment) also help with other information on processes. System-level tuning includes OS tunables in /etc/system and through the ndd command. Many interesting Java Virtual Machine tunables are the -X and -XX options to the command in Sun ONE Application Server, and these must be specified in JAVA_ARGS in the file ias-install-directory/ias/env/iasenv.ksh.
The value of adjusting these tunables depends on the application, so the sequence would be to profile first; tune the application; and then tune the application server, virtual machine for the Java platform (JVM) software, and OS. Inserting timers into your application may also help you spot initial bottlenecks. At the
application-server level, the kxs logs are a good place to look for information about how much time it took to process a request. You can enable such logging on the Web connector by setting to 1 the value of the registry
key
SOFTWARE\iPlanet\Application Server\6.0\CCS0\HTTPAPI\NASRespTime. Note that the uncertainty rule applies and that measurements are costly. The measurements may be skewed by the presence of the logger itself.
Using a Profiling Tool to Obtain Data
The Profiler analysis helps you spot performance bottlenecks. Execution times of different methods, monitor contentions, and object allocations are important figures to look at during profiling. The aim of profiling is to make sure that relatively high amounts of time are not being spent in a certain body of code. Common
issues uncovered during profiling are heavy String manipulation, file reads that do one byte at a time instead of buffered reads, and extensive file I/O.
Knowledge of times spent in different methods may require rewriting a method so as to use a different algorithm if the bottleneck is not in a data structure but is an inefficient algorithm. Adherence to good Java programming practices also helps. The JDK 1.3.1 version shipped with Sun ONE Application Server 6.5 implements the Java Virtual Machine Profiler Interface (JVMPI).
HPROF, a profiler shipped with the JDK software, generates helpful profiling information. One way to enable
HPROF is to add to JAVA_ARGS in your iasenv.ksh -Xrunhprof:cpu=samples,file=/tmp/my_profile.out. There are many other ways to configure HPROF, so look up the documentation for HPROF.
The following is a sample dynamic stack trace from
HPROF.
TRACE 116: java.lang.Class.forName0(Class.java:Native method) java.lang.Class.forName(Class.java:120) com.sun.corba.ee.internal.corba.ServerDelegate.class$(ServerDelegate.java:83) com.sun.corba.ee.internal.corba.ServerDelegate.getClientSubcontractClass (ServerDelegate.java:126) TRACE 240: com.sun.corba.ee.internal.POA.POACurrent.peekThrowNoContext(POACurrent.java:169) com.sun.corba.ee.internal.POA.POACurrent.get_object_id(POACurrent.java:64) com.sun.corba.ee.internal.POA.DelegateImpl.this_object(DelegateImpl.java:36) org.omg.PortableServer.Servant._this_object(Servant.java:64) TRACE 28: com.kivasoft.gds.GDSKey.getValuenative( The trace numbers are important for correlating a stack with a certain heap profile or a CPU usage profile. The part of the profile that shows CPU usage for the same application is shown next.
CPU SAMPLES BEGIN (total = 330) Fri Feb 8 15:29:48 2002 rank self accum count trace method 1 76.97% 76.97% 876 41 com.kivasoft.thread.ThreadBasic.run 2 11.52% 88.48% 108 277 com.kivasoft.lcycmgr.LifeCycleMgr.waitForStoppedStatenative 3 1.52% 90.00% 12 493 com.kivasoft.thread.ThreadBasic.run 4 0.91% 90.91% 5 534 com.kivasoft.types.COM.COMClear 5 0.91% 91.82% 7 494 com.kivasoft.types.COM.COMClear 6 0.91% 92.73% 5 516 com.kivasoft.util.ValList.getValStringnative 7 0.91% 93.64% 6 500 java.net.SocketInputStream.socketRead 8 0.61% 94.24% 3 530 java.lang.Object.wait 9 0.61% 94.85% 3 524 com.kivasoft.types.COM.COMClear 10 0.61% 95.45% 10 495 com.kivasoft.util.ValList.getValStringnative 11 0.30% 95.76% 2 512 java.lang.ClassLoader.defineClass0 12 0.30% 96.06% 3 506 com.kivasoft.types.COM.COMClear 13 0.30% 96.36% 5 319 java.lang.Class.newInstance0 14 0.30% 96.67% 1 547 oracle.jdbc.driver.OracleStatement.clearDefines 15 0.30% 96.97% 2 499 java.lang.Object.notify 16 0.30% 97.27% 1 548 java.lang.Thread.currentThread 17 0.30% 97.58% 1 542 com.kivasoft.util.Stream.flushnative 18 0.30% 97.88% 4 497 java.lang.Throwable.fillInStackTrace 19 0.30% 98.18% 1 544 java.lang.Class.newInstance0 20 0.30% 98.48% 1 545 oracle.jdbc.driver.OracleStatement.doDefaultTypes 21 0.30% 98.79% 2 507 sun.misc.URLClassPath.getLoader 22 0.30% 99.09% 1 543 java.net.URLClassLoader.findResource 23 0.30% 99.39% 2 279 com.kivasoft.bind.BinderServlet.bind 24 0.30% 99.70% 1 546 java.lang.Thread.currentThread 25 0.30% 100.00% 2 533 sun.misc.URLClassPath.getLoader CPU SAMPLES END Again, note that the trace column in the preceding output can be used to connect this output to a stack trace.
Tuning Based on Data Obtained
CPU hot spots and excessive object creations are important candidates for streamlining. The processor usage profile can report the time spent in different methods; look for ways to minimize time that looks relatively high. Cumulative time spent in a certain method and all the methods it calls are important pieces of information. After some iterations, the profile should look tuned.
System-level indications of a tuned application under load are a high percentage of time in user mode, relatively low system time (absolute times depend on the application), low mutex contention, and other indicators. Continue the number of iterations of profiling, tuning, and load testing until the desired performance is reached. If performance goals are not realized with basic tuning, a rearchitecture of the software may be necessary.
Summary
Profiling and load testing are important parts of the application development cycle that should not be neglected.
Resources
For more about tools and other sources of information referred to here, consult the following resources:
| ||||||||||||||||||||
|
| ||||||||||||