Sun Java Solaris Communities My SDN Account Join SDN
 
Article

Solaris Developer Chat Sessions

 
 

Solaris Live Transcripts Index

May 17, 2001

Chat Title: Solaris Utilities for Monitoring System Performance
Guest Speakers: James Liu and Karpagam Narayanan

This is a moderated forum

LizA: Welcome to the Solaris Live Chat, "Solaris Utilities for Monitoring System Performance" with James Liu and Karpagam Narayanan. James was our first Solaris Live! guest and we're very happy to have him back. James is ready to answer your questions on software development and benchmark formation strategies and configuration, scaling analysis, processor management, thread libraries, and so on. He is joined by Karpagam Narayanan, who has lots of experience with all the standard tools like Virtial Adrian (aka SE Toolkit) disk partitioning, network bandwidth trunking, and other things that get your app to run faster on Solaris. Karpagam and James, let's say that I'm new to Solaris and I want to know what CPU a process takes. Is there a command that shows me this?

jamesliu: I'll take this one. A number of commands can show this. You can use prstat which is bundled with Solaris 8 and is probably easiest. If you have the freeware top... you can use this too.

LizA: What does NLWP mean in prstat?

karpagam: NLWP refers to the number of light weight processes, or LWP, associated with the process.

LizA: How does someone find out which processors are online or off line?

jamesliu: You can find out using the psrinfo command. -v option gives you a lot of info on the processors

LizA: I need to increase the file descriptors on my server...I bumped up the ulimit but it still doesn't work. What else do I need to do?

karpagam: Increase the rlim_fd_max and rlim_fd_cur parameters in /etc/system. Remember that these take affect after you reboot.

jamesliu: LizA, you can also gain some efficiencies if your problem is related to using network file descriptors (i.e. sockets). You can tune the tcp/ip parameters using the ndd /dev/tcp command to shorten the tcp_time_wait_interval.

tefluid: I'm interested in optimizing application servers in order to run Java engines such as BEA WebLogic and ATG Dynamo. What advice can you give on profiling the system to best determine where the bottlenecks lie?

karpagam: This is a Java on Solaris question. Java has a profiling tool called hprof that can be included in the command line. Type -Xrunhprof:help for more info on this. The output gives you methods that take more CPU time...

karpagam: tefluid, There is a HAT (Heap Analysis Tool) also available. There are also 3rd party GUI tools available. Optimizeit and JProbe are two of them.

LizA: I heard that in Solaris you can allocate certain processors to work on only one process. Will that help, too?

jamesliu: LizA, you can in fact specify certain processors to a specific process. The command to use is psrset. For folks like Tefluid, binding the JVM PID to a processor set and excluding interrupts can possibly give a boost in performance.

Craki: I have a farm of Sybase database boxes all on Solaris 8. Where can I start in making sure that everything that can be optimized is, for database operations.

karpagam: Craki, I would always start with the db monitoring tools. Once you are sure that you do not have any issues go through the system parameters...

karpagam: Craki, Start by looking into shared memory, semaphores and message queue parameters first in the /etc/system. Then look into disk, network, NFS, swapping/paging, memory, CPU, filesystem, and TCP, one at a time...

karpagam: Craki, do look in http://www.sun.com/sun-on-net/performance/perftools-solaris8.pdf for more info on Solaris tools

Zartaj: I am interested in performance comparisons between Sun Solaris and Wintel. The problem is it is not easy to decide what is the right pair to compare. I have a UE250 450MHz with Solaris 8 and a P3 733 MHz with Windows 2000. I have seen the Wintel box consistently outperform the UE250. But is that a fair comparison? In general if I have a Sun system how do I determine what is the equivalent Wintel system to compare. Going by price alone, Wintel seems to have the edge.

jamesliu: Zartaj, it is often a race for more MIPS/MFLOPS, etc. in the hardware area. I don't know which benchmarks you run but in those apps that are important to Sun's customers. Sun consistently tunes our applications to out scale and outperform anything on the market. It all depends on the use. In your particular case, it may in fact be that Wintel has better price performance. In many of Sun's core customers, our value proposition is reliability, availability and scalability. We've competed well on this philosophy for about 18 years and I predict we'll continue. As for your particulars, perhaps we can communicate offline and discuss how to improve your performance.

alexc: We use some scripts to automate gathering info from ps. We also use sar. We notice that total CPU utilization (by adding up ps info) is usually quite a bit less than what is stated by sar. Why is there a discrepancy?

karpagam: Alexc, I am not sure what ps you are referring to - /usr/ucb/ps? In what version of Solaris? I do not know the time interval that ps uses for data gathering. If you are in Solaris 8, try using prstat. There are a lot of parameters that can come into play here - interval, versions, options for the tools, etc...

LizA: What do I need in order to look at mpstat? What do the columns mutexes and context switching mean?

karpagam: LizA, mutexes occur when a lot of CPUs are trying to grab the same resource lock. Only one CPU will be successful at any time. We do not want this to happen a lot...

jamesliu: LizA, context switching is also something that, done too often, expends resources... What you want to do is to limit these values to certain levels. smtx, for example is best below 500 per CPU per second. Context switches ... you can check at http://www.setoolkit.com.

Zartaj: I'd like to know what tools are available for shared library profiling? Shared libraries cannot be instrumented for prof or gprof. And the LD_PROFILE variable can be used only for one shared library at a time. So how do I go about profiling all shared libraries being used by an app?

karpagam: Zartaj, You can try using truss and sotruss. truss gives shared library activity and entry/exit trace of user-level function calls. sotruss is good and has less noise than truss...

dmdebertin: Are there any particular columns in vmstat (or other command) output that could indicate hardware or software problems? What are some things to look for that could indicate problems, and what is harmless?

jamesliu: DMDebertin, if your CPU percentage is high but system usage is low, most of the CPU is consumed by your app. You may want to think about tuning your code in this case. If system time is high, check out more with mpstat and look at context switch and smtx values.

Emory2: Could you please compare the performance of a 24 CPU SunFire 6800 to the performance of a 24 CPU IBM S80 (configured with the same amount of RAM).

karpagam: Emory2, For what workload? You can consider looking into TPC-C, TPC-D, spec standard benchmark pages that matches your workload.

LizA: How do I monitor the network?

karpagam: LizA, the primary tool you can use is netstat. There are options like -in for cumulative data, -s for TCP/UDP stats, -I for specific interface. I like to put in netstat -in in a while loop...

jamesliu: LizA, Sun also provides some scripts for tuning your network drivers. http://www.sun.com has these scripts. Search for "network tuning" or "syn flood" and you should see some docs on how to tune your network interface.

karpagam: LizA, netstat -a gives a lot more information on thevsockets/ports open. Look for ESTABLISHED and TIME_WAIT

LizA: netstat -a tells me that I have over 8000 connections. But I have only 3000 sessions open. They have a time_wait status on more than half of them. Is that something to do with my application?

jamesliu: LizA, Regarding netstat output, you'll probably have lots of network sessions still waiting to close. The default setting on Solaris is 240 seconds. You can use ndd /dev/tcp to set the tcp_time_wait_interval to a lower value so that these connections close down more quickly. Say 30 seconds is good. Be careful not to set this too low as slow connections (e.g. modems) might get dropped.

Zartaj: I believe a 32-bit process can only use around 3GB out of a possible 4GB. So is it useful to have more than 4GB physical memory on a system that allows it?

karpagam: Zartaj, What you need to look into is how much your application uses/needs. Are you running 64-bit Oracle and need more than 4GB SGA? Use pmap to tell you the processor footprint and calculate on that basis.

Zartaj: In the Solaris Multithreading Guide, it recommends against thread-pooling saying it is cheaper to create threads as needed. Do you agree with that?

jamesliu: Zartaj, in general I would agree that threads are relatively cheaper to create than to pool. Pooling creates many potential oppotunities for contention. However, in some cases, such as Java, the threading model may be more amenable to pooling since there is a Java layer there.

jd: The way I understand load average to be calculated, it is incremented by 1 for every CPU's worth of time spent. (Ex. a 10 CPU system with 10% user time as shown by vmstat will report a load avg. of 1). High system time (as show in vmstat) causes load to jump very high in some cases; I have seen load avg. of 30 on a 10 CPU system with 40% system time/10% user time. I would like to know how the system comes up with that load avg.

jamesliu: jd, I couldn't tell you exactly how the algorithm works. It's been a while since I've touched on it. Karpagam?

karpagam: jd, A high system time of that ratio clearly shows that there is a bottleneck. Did you check to see how your disks are doing. You also might want to see in mpstat/top/prstat/statit how the utilizations per processor is.

Craki: I find that whenever a box has fairly high uptime, memory reports on usage is higher than it should be. My DBA's see this and start getting worrired about the boxes not being big enough. Is this a Solaris behavioral quirk?

jamesliu: Craki, I can't be certain, but our experience shows that in uptimes of 60+ days, the memory footprint remains stable on many of our servers. The most common area of memory growth over time we've seen has perhaps been in memory leaks on the application or windowing side. Many windowing apps or servers or windows managers do in fact leak lots of memory. This may be the cause of growth over time.

jd: I am not asking about a problem in particular, I have just seen the load avg. jump like that and am curious as to how it's calculated.

karpagam: jd, Did you see this on Solaris 8?

Emory2: Does anyone know if there is a working version of "proctool" for Solaris 8? One version that we tested did not work for multiprocessors.

karpagam: Emory2, you can use /usr/proc/bin proc tools - right? pmap, ptree, ptime, pldd, etc...

jd: I have seen it on 2.6 and 8; the most recent was on 8 where a Java programmer had an app. that went crazy with creating/deleting threads.

jamesliu: jd, I guess you're still asking about how the load average is computed. Again, I can't tell you off hand since it's been a while since I've touched the algorithms. But I can imagine that any process that creates/destroys lots of threads is a contrived and somewhat unique situation. Perhaps we can work offline to discuss optimization and development techniques to reduce the CPU utilization.

LizA: Are there any special libraries I can use to improve performance?

jamesliu: LizA, there are a number of libraries that might boost performance. Some are in Solaris 8, some are third party. If you have a thread intensive application and have high smtx values, due to schedlock, you may want to put /usr/lib/lwp at the top of your LD_LIBRARY_PATH which is an alternate thread library. If your app. is memory allocation intensive, there are 3 ISV solutions that replace the bundled malloc on Solaris that improve performance.

alexc: question about threading, etc., ... the way I understand it, some programmers use multiple processes to do threading (spawning child processes) and some use threads within a single process. Clearly, multiple processes can run on multiple processes simultaneously. However, can threads within a single process run on more than one processr simultaneously?

alexc: Rather, multiple processes can use multiple processORs, but can threads within a single process do the same?

jamesliu: Alexc, absolutely. Threads do run on multiple processors on Solaris. As do multiple processors with multiple threads. Solaris supports scheduling that allows a many-to-many relationship between threads or processes and processORs.

Craki: Can you recommend a centralized monitoring/management package? I've done a small deployment of Sun Management Center and liked it. Would Big Brother be a good solution as well?

karpagam: Sun Management Center is very good. If you want to monitor database statistics also, I know that a lot of folks use Foglight from Quest Software. I do not know about Big Brother - sorry.

LizA: We're about out of time. Thanks to Karpagam and James...and all of you who asked such great questions. Karpagam and James, do you have a few parting words?

jamesliu: It has again been a pleasure. I'd be pleased to field questions in this forum again soon. -JCL

jamesliu: Note to all, if you're running any of the vmstat or mpstat, just make sure you put a time interval like 5 seconds and exclude the first entry in you computations. - jcl

karpagam: Thanks everyone for all the wonderful questions. It has been a pleasure. Thanks LizA for taking this forum smoothly :)

LizA: Be sure to join us again on June 21, at 10 a.m. PDT, when our guest is Rich Teer and the topic is "Secure C Programming."

May 17, 2001


Back to Top