Sun Java Solaris Communities My SDN Account Join SDN
 
Article

Bottleneck Basics: Understanding and Preventing Systems Slowdowns

 
By Jamie Wilson, June 2003  

Understanding and Preventing Systems Slowdowns

Summary

Has increased demand caused a shortage of resources on your server? Are customers complaining about slow response times? In these days of exponential network growth, keeping up with demand can be a difficult challenge. Jamie Wilson explains what you can do to analyze your current resource demands, and gives tips on planning for future growth. (2,100 words)

It's a phone call most administrators never want to receive. "The server is slow, no one can check email. Web pages are loading slowly, or not at all!" Too often administrators find themselves trying to climb up the steep slope of increased demand. As a user base grows, the demand placed on the server grows as well. This growth may be linear and predictable, or it may be completely random or exponential.

There are ways to avoid the angry phone call altogether. Understanding system bottlenecks and gathering statistical data can help you project your system's current and future needs. This can eliminate user complaints -- and prevent that phone from ringing.

What causes a bottleneck?

Why does a system slow down in the first place? Slowdowns can usually be attributed to one or more bottlenecks, which are caused when part of the system is not running fast enough to keep up with the demands placed on it. The most common bottlenecks occur for the following reasons:

  • Slow disks or disk arrays aren't able to handle I/O requests quickly enough
  • The system is starved for memory, so applications are forced to swap to disk, which can slow response
  • The system is out of processor power
  • The network interface is overloaded

So how can you tell which of these systems may be having a problem? By using the various tools of the capacity planning trade: sar, netstat, lockstat, and top

sar

sar is by far one of the most valuable tools an administrator has to track past trends and predict future demand. sar is only installed by default with the full distribution of Solaris. Verify that sar is installed on your system:

pkginfo -l SUNWaccu

If it's not currently installed, you can add it by installing SUNWaccu.

Once sar is installed, you'll need to configure it to begin collecting data. First, edit the system's crontab:

crontab -e sys

Remove the comments so that you have these lines:

0 * * * 0-6 /usr/lib/sa/sa1
20,40 8-17 * * 1-5 /usr/lib/sa/sa1
5 18 * * 1-5 /usr/lib/sa/sa2 -s 8:00 -e 18:01 -i 1200 -A

Then vi /etc/init.d/perf.

Remove the comments below Uncomment the following lines.

This will enable sar for system-activity reporting. You may also want to increase sar's log retention:

vi /usr/lib/sa/sa2  
/usr/bin/find /var/adm/sa ( -name 'sar*' -o -name 'sa*'
     ) -mtime +30 -exec /usr/bin/rm {} ;

Your system will now begin gathering data. For a detailed explanation of how to use sar, please see the sar man pages. Here is a quick list of sar's more useful features:

  • sar run with no options shows CPU usage
  • sar -q shows your average queue size
  • sar -p and sar -g show paging activity
  • sar -d shows disk utilization
  • sar -f reads a previously saved file, sar -f /var/adm/sa/sa03

Back to Top


netstat

One of sar's shortcomings is that it will not trend network traffic for you. This can be done using netstat. netstat -in will show you your network interfaces, how much traffic they have passed since booting, and any problems with them.

netstat -in

 Name Mtu Net/Dest Address Ipkts Ierrs Opkts Oerrs 
      Collis Queue

hme1 1500 192.168.100.0 192.168.100.1 1477758588 0 
     2897473608 0 0 0

hme2 1500 192.168.101.0 192.168.101.1  3228181693 
     157415 3365694030 0 0 0

From this example, you can see that hme1 and hme2 are very busy, with hme2 having seen some incoming errors on its interface.

lockstat

With Solaris 2.6 and up, Sun included a utility called lockstat, which can show you what is causing kernel locking. The lockstat man pages are available for more information. Here is one example of how to use this utility:

lockstat sleep 30 > /tmp/lock.out
more /tmp/lock.out

Callers with the most lock counts may be causing problems. If you see hmestart or qfestart causing many kernel locks, you may need to add another network interface.

top

top is not installed with Solaris, but it is invaluable tool that offers a realtime snapshot of what's happening on the system; you can download it from http://www.sunfreeware.com. top will show you how much memory is free on the system, and which processes are using the most CPU or memory resources.

So where's the slowdown?

Using tools such as sar, netstat, and lockstat can help you determine where a slowdown might be happening, or where one is about to happen. Here are some examples of how you can use these tools:

  • sar with no options. This will show how idle the CPUs are. If your CPUs are using a lot of %usr or %sys, you may have to add extra CPUs to deal with increased demand. If %wio is high, your system is waiting for your I/O subsystems to catch up. You may have a slow disk or array.


  • sar -g. If you have many pgscan/s, your system is swapping. No swapping is the only good swapping. Your system is probably short on memory. Use sar -r to verify this.


  • netstat -in. Look to see if an interface is overloaded with traffic. If so, you may have to add another physical interface. Also, look for Ierrs, Oerrs, and Collis. These should all be relatively low numbers if not zero. High numbers in these columns can indicate network problems, such as speed or duplex autonegotiation issues, bad cabling, or a bad switch port.


  • top. If all else fails, look at top. What process is taking up the most resources?

Back to Top


Analyze the data and make recommendations

So you've put together all of your reporting tools. You're able to do past trend analysis and future growth predictions based on sar. You can also do realtime snapshots using top. What should you do to make the system perform better now, as well as in the future?

It's very important to note that if you do identify and solve a bottleneck, your solution can potentially cause even worse problems. For example, if you have idle CPU and a busy disk, replacing the busy disk with a fast disk can cause the CPU usage to spike. Remember, capacity planning is a constant exercise, not a one-time activity. Here are some scenarios:

  • Busy I/O subsystems. Say you've determined by using sar -d that one or more of your disks is very busy (more than 90 percent busy). Either move I/O from that disk to a faster disk or array, or split up the I/O amongst many arrays, depending on the data. Remember also that SCSI interfaces can be overloaded as well. This is difficult to determine, but it's a good idea to add new SCSI interfaces and balance I/O traffic accordingly. Improving I/O access can have a major impact on CPU or network performance.


  • Busy CPUs. Using sar, it may become apparent that your system is in heavy %usr and %sys. You may also want to use mpstat to see more information about your CPUs. Adding CPUs in this situation can help, but it may not solve the problem. A poorly written application can consume infinite amounts of CPU resources.


  • Busy network. netstat -in and lockstat may show your network interface to be very busy. Add another physical interface, but beware of increased I/O and CPU demands. Is the system swapping? Add more memory. Do whatever you can to prevent the system from swapping. If possible, create swap on fast disks.

Application slowdowns

Sometimes system hardware isn't the problem at all. Remember that applications are what consume system resources, and poorly written applications can be very difficult to deal with. Here are some bits of advice:

  • Beware of single-threaded applications. While a single-threaded application is generally easier to develop, it's also more costly to run. Many applications developed in-house are single-threaded. The worst example is the single-threaded nonforking application. This is an application that's not only single-threaded, but also won't fork copies of itself to consume resources more efficiently. top will only show one instance of this daemon running. ps -eLf will only show one thread. This can be a very challenging application, as it may only consume a single CPU even if you add more CPUs. Single-threaded applications that fork copies of themselves are much easier to deal with, but still are not as efficient as a multithreaded application.


  • Learn as much as possible about the application you're dealing with. Talk to the vendors or the authors, because they'll know what tricks and tips will work best. Often, entries need to be made in /etc/system so that an application can work at peak capacity. ndd settings may also need to be tweaked based on your current needs. Consider all of these performance suggestions before adding new hardware.

Back to Top


Planning for future capacity

Sometimes the best way to plan for the future is to look at your past performance data. Using sar, you can ascertain a trend in the resource consumption on your system. If your system CPU was 90 percent idle three months ago, and now it's 80 percent idle, it's not unreasonable to assume that in three months your system will only have 70 percent idle CPU. Some parts of your system may grow at exponential rates, such as I/O or network subsystems. That's why it's important to constantly gather data, so you can see where you've been and where you're going. You may also want to consider writing scripts that can monitor sar and alert you when certain thresholds are reached. If your I/O is 70 percent busy for more than a week, it's probably time to consider a replacement or an upgrade.

Communication within your own organization can help you meet future capacity as well. You need to know if your marketing department is planning a big push to acquire more customers, or if a new accounting system is going into place next week. Growth is then predictable, as you can plan for increased access to your database or for exponential growth in your Web server's traffic. Knowing how your customers will be using your servers will help you provide better performance.

Scaling horizontally and vertically

For large-scale applications, it's extremely important to be able to scale your systems both horizontally and vertically. Horizontal scaling allows you to add many boxes to serve the same application, while vertical scaling allows you to break the application into pieces so that each one can be scaled horizontally. A system designed to be both horizontally and vertically scalable allows you to add servers as demand increases. This way, you avoid the pitfalls of trying to scale one big box, and can benefit from having many small boxes.

Here are some examples of horizontal and vertical scaling:

  • Horizontal Web servers. Multiple Web servers are set up serving identical content, using independent hardware on different networks. DNS round robin or load balancing can be used.


  • Horizontal and vertical email solutions. Each component of the email server (mx, SMTP, POP, Web mail) can be run on its own independent server. Multiple individual servers can be set up to balance the load. In this way, you can have four mx servers, two SMTP servers, two POP servers, and one Web mail server, or whatever configuration you need to meet demand.


  • Horizontal and vertical Web servers. Multiple Web servers can be set up -- some that serve graphics, and others that serve just CGI scripts. Servers can be added as demand increases.

Staying ahead of the curve

Using reporting tools such as sar makes it possible to identify trends on your system. Learning about the applications on your system and communicating with your organization can also help when planning future growth. Finally, designing a system that can scale both horizontally and vertically can help you stay one step ahead of the growth curve.

Resources

Reprinted with permission from the December 2000 edition of Unix Insider.

Rate and Review
Tell us what you think of the content of this page.
Excellent   Good   Fair   Poor  
Comments:
Your email address (no reply is possible without an address):
Sun Privacy Policy

Note: We are not able to respond to all submitted comments.