[Packer01] Chapter 21. Drill-Down Monitoring

来源：百度文库编辑：神马文学网时间：2024/04/29 20:05:43

Chapter 21. Drill-Down Monitoring

Thetime has come. Armed only with your wits, a little common sense, andsome basic system knowledge, you’re going to crack the performanceproblem bedevilling your database server. You roll up your sleeves andseat yourself firmly in front of a keyboard. A cluster of slightly awedcolleagues watches wide-eyed over your shoulder.

OK,perhaps I’m getting a bit carried away here. Suffice it to say that theaim of this chapter is to develop a simple method for identifyingperformance problems on database servers.

I’massuming you have already looked at the issues covered in earlierchapters. Your application’s behavior is well understood andconsultants in small doses have already done wonders with applicationperformance. You’ve done what you can to fix any database schema designproblems and the addition of a couple of crucial indexes has alreadycalmed the users down a little.

You’vechecked out obscure things like environment variables and racked yourbrains for other issues that might need attention. But performanceproblems still persist. Perhaps you need to upgrade your hardware, butat this point you’re not sure.

Whereshould you start? I’m going to suggest a five-step process that willwalk you through the major components of the system: memory, disk I/O,network, and CPU, followed by database monitoring and tuning. If aproblem becomes apparent in one of these areas, there may be furthersteps to narrow the problem. This kind of “drill down” approach is aneffective way to identify and ultimately solve problems.

Ifyou find a bottleneck (by which I mean a constriction of performance,just as the neck of a bottle limits the flow of liquid into or out of abottle), does that mean you should look no further? I would suggestgoing through the whole process anyway to see what you can discover.

Bearin mind, though, that fixing a bottleneck in one place might exposeanother elsewhere. Suppose, for example, your system is paging severelydue to a lack of memory, but no problems are apparent elsewhere. Addingmemory might allow your throughput to improve to the point where one ofthe disks becomes overutilized, resulting in a disk bottleneck.Checking out the disks as well might give a hint of problems to come.

Once you’ve found and resolved a bottleneck, go through the entire process again.

Finally,is the order of the steps important? Of course there are many possibleways of tackling system monitoring, but I suggest you go through thesteps in the order shown.

STEP 1. Monitoring Memory

To check for a memory bottleneck, use thevmstatutility, which shows, among other things, memory behavior for thesystem. A 5-second interval is a good choice for live monitoring.

Thevmstat trace in Figure 21.1 shows a system with no evidence of memory shortfall.

Figure 21.1. vmstat trace with no memory shortfall

Code View: Scroll / Show All

 procs     memory            page            disk          faults      cpu 
 r b  w   swap  free   re  mf pi po fr de sr m1 m2 s6 sd  in   sy   cs  us sy id 
 0 8  0 3557104 1359368 0  621 0  0  0 0  0  0  0  0  0  417 11922 1190 12  2 86 
 0 7  0 3555728 1358080 0  729 0  0  0 0  0  0  0  0  0  449 12797 1985 20  2 78 
 5 9  0 3512184 1318120 0 3666 0 12 12 0  0  0  0  0  6 2198 32163 7404 70 10 20 
 3 15 0 3485016 1293944 0  939 0 24 24 0  0  0  0  0  1  892 30760 2842 50  4 46 
 0 18 0 3480520 1289912 0  813 0  0  0 0  0  0  0  0  1  616 27887 2895 31  4 65 
 0 17 0 3476216 1285992 0  547 0  3  3 0  0  0  0  0  0  516 31687 1716 21  2 77 
 1 16 0 3473000 1283232 0  542 0  6  6 0  0  0  0  0  0  663 43993 2112 30  3 67 
 2 16 0 3469568 1280368 0  712 0  8  8 0  0  0  0  0  1  666 39791 3176 34  3 62

Figure 21.2 shows avmstat trace from the same system during a severe memory shortfall.

Figure 21.2. vmstat trace with severe memory shortfall

Code View: Scroll / Show All

 procs     memory            page            disk          faults      cpu 
 r b  w  swap  free  re  mf  pi po  fr   de   sr m1 m2 s6 sd in  sy  cs  us sy id 
 0 31 0 2175384 47800 3 542  1 166  593 38656 141 0 0 0  1 689 26511 3549 19 4 78 
 0 27 0 2170608 47552 2 790  4 305 1116 45208 269 0 0 0  2 711 36787 6050 27 6 66 
 0 28 0 2168088 48704 4 788  1 190  432 47256  92 0 0 0  2 718 30558 3291 23 4 73 
 0 29 0 2164592 47664 1 699  8 158  574 47712 136 0 0 0  1 777 29870 3400 19 3 78 
 1 27 0 2162136 48184 2 734  9 140  403 42944 105 0 0 0  2 708 28258 3027 22 4 74 
 1 27 0 2158560 47688 0 498  4 166  606 38656 146 0 0 0  1 750 41527 3034 20 4 76 
 0 27 0 2155136 47408 1 489  6 240  796 38656 179 0 0 0  1 754 31926 3275 17 4 79 
 0 27 0 2151664 47824 1 581  6 187  622 47712 145 0 0 0  2 946 36169 3741 19 5 76

What to Look For

Look forpo (pageouts—the kilobytes paged out per second) andsr(scan rate—the number of pages scanned by the clock algorithm). Whenboth are consistently high at the same time (much more than 100 persecond, say, on a system with up to 4 Gbytes of memory, more on alarger system), then it is possible the page daemon is being forced tosteal free memory from running processes. Do you need to add morememory to the system? Maybe.

Morememory may not help, though. That might sound crazy, but unfortunatelythe water is a little muddy here. Some explanation might help clarifythe situation.

Pageouts can happen for a number of reasons, including the following:

Dirty (modified) file system pages are being flushed to disk. Such flushing is normal behavior and does not represent a problem. If database files are placed on file system files, expect to see this kind of pageout.
Application pages are being pushed out to the swap device to free up memory for other purposes. If the applications in question are active or about to become active, paging is bad!
New memory has been allocated by an application and swap space is being assigned to it. This, too, is normal behavior and does not represent a problem.
Memory pages have been freed by applications and are being flushed to disk. Isn’t paging a waste of time if the memory is no longer required by the applications? You bet! Solaris 8 introduced a new madvise() flag called MADV_FREE to enable developers to tell the operating system not to bother to flush such pages to swap.

The scan rateis a measure of the activity of the page daemon. The page daemon wakesup and looks for memory pages to free when an application is unable tofind enough memory on the free list (memory has fallen to thelotsfree system parameter). The greater the memory shortfall, the faster the page daemon will scan pages in main memory.

The major consumers of memory in a system are:

Applications, including text (binary code), stack (which contains information related to the current phase of execution of the program and its functions), heap (which contains program working space), and shared memory.
The file system page cache, which contains file system data (all file system disk blocks must first be read into memory before they can be used). This cache becomes important when database files are stored on file systems.
The operating system kernel.

Normal Paging Behavior Prior to Solaris 8

Before Solaris 8, thefree column reported byvmstatmay not be a good indication of the available memory in the system. Thereason is that once memory pages are used by the file system pagecache, they are not returned to the free list. Instead, the file systemdata blocks are left in the cache in case they are needed again in thefuture.

Whenthe page daemon detects a memory shortfall and scans for pages to free,it may well choose to free some of the pages in the file system pagecache. If the pages have been modified, they are first flushed to disk.There is no simple way of finding out how much of main memory is beingused by the file system page cache at any point, but you can bet itwill be substantial if database files are located on UFS files ratherthan raw partitions. Thememtool package (Richard McDougall’s memory monitoring tool), available on the book website, can identify memory use by UFS files.

Theproblem is that the page daemon may free application memory pages aswell as file system page cache pages since it doesn’t know which iswhich. The result can be severe paging and major performance problems.Adding more memory won’t help much, either. It will simply mean thatmore database pages can be cached in the memory. Fortunately, there isa solution.

Priority Paging

As of Solaris 7, a new feature called priority paginghas been added. Priority paging lowers the priority of file systempages in memory so that the page daemon will choose to free them aheadof application pages. This behavior can make a huge difference topaging problems; priority paging should be enabled wherever databasescoexist with active file systems, and especially where database filesare placed on file systems.

You can activate priority paging by adding the following line to/etc/system:

set priority_paging = 1

Patchesare available for Solaris 2.5.1 and Solaris 2.6 to add priority pagingfunctionality. From Solaris 8, changes to the virtual memory systemmean that priority paging is no longer required and should not be used.

UFS Files and Paging

Ifyour database files are UFS files rather than raw devices, you mayobserve significant scanning even once you have enabled prioritypaging. In fact, the scan rate may increase since priority pagingcauses the page daemon to become active sooner. This behavior is anatural consequence of the need to bring all database pages into theUFS page cache before they can be accessed by the database. The ongoingneed to find free memory gives rise to constant scanning activity onbusy database servers using UFS files.

Ifyour application carries out updates, inserts, and deletes, you shouldalso expect to see pageout activity. All database writes must gothrough the UFS page cache before being written to disk. Although thepage being writ-ten to disk would previously have been read into theUFS page cache, it might have since been reused if the scan rate ishigh. In that case the page must be reread from disk before the writeto disk can proceed. This process in turn displaces another page, andthe cycle continues.

Howdo you stop all this paging activity? To eliminate scanning (assumingyou have enough memory for your applications), either use raw devicesor mount your database partitions with the Direct I/O flag (forcedirectio).

Aword of caution: eliminating paging activity with Direct I/O may notalways result in instant performance improvements. The file system pagecache acts as a second-level cache for database pages, and removing itfrom the picture will expose any inadequacies in the sizing of yourdatabase buffer cache. Make sure that your database buffer cache isadequately sized; otherwise, you may find yourself with plenty of freememory and a database buffer cache starved of buffers.

EnablingDirect I/O for database files, and especially for database logs, canoffer significant performance benefits as of the Solaris 8 1/01release; earlier versions of Direct I/O may not offer significantperformance gains. Direct I/O is described in more detail in “Unix File System Enhancements” on page 21.

Asa final caution, although Direct I/O can prove very useful for databasefiles, do not enable it for nondatabase files without first examiningcarefully the performance implications of doing so.

Normal Paging Behavior as of Solaris 8

Aswe have seen, priority paging doesn’t go all the way to solving theproblem. Although the page daemon will choose file system pages inpreference to application pages, the page daemon still has to searchthrough the whole of memory to find them. And if large database buffercaches are being used, file system pages may only represent a smallproportion of the total memory, so a lot of searching will be necessary.

Asof Solaris 8, file system pages are separately accounted for, so theycan be freed without a memory scan to find them. Consequently, the pagedaemon is not needed at all unless there is a major memory problem. Asa result, the likelihood of paging problems is greatly diminished.

Drilling Down Further

If you want to find out where the memory is going, there are a number of options:

For the final answer on memory consumption, use memtool. The procmem script described below will provide a detailed breakdown of process memory usage. Both memtool and procmem are available on the book website.
Run dmesg and look for Avail mem. The difference between available memory and physical memory (use prtconf or /usr/plat- form/'arch -k'/sbin/prtdiag to find out the physical memory) indicates the amount of memory reserved for the kernel.
Use /usr/ucb/ps -aux to find out which processes are the major memory hogs. This command lists the percentage of memory used by each process. Beware, though! The memory listed is virtual memory, not physical memory, and it may not be a good indication of how much physical memory is actually being consumed by the process at any given moment.

Theprocmem script requires you to install the unbundledmemtool package since it uses thepmem program (similar to the standard Solarispmapprogram, but with various bugs fixed on some Solaris releases). Someusers are unwilling to install an unbundled package on a productionsystem—an alternative version based onpmap is also available. Theprocmemscript summarizes memory use for all processes and gives a breakdowninto resident, shared, and private memory usage. Please note that bothprocmem andmemtool are unsupported software.

Passing the-h parameter toprocmem results in the following usage information:

Code View:Scroll/Show All

usage: procmem [-v] [-h | -p pidlist | [ -u username ] [ searchstring ]]
Examples:
procmem -p 10784 10759
      - show memory usage for processes with pids 10784 10759
procmem -u root
      - show memory usage for all processes owned by the root user
procmem -u "daemon root"
      -  show memory usage for all processes owned by the root & daemon users
procmem netscape
      - show memory usage for process(es) in 'ps -ef' with "netscape"
procmem -u fred netscape
      - show memory usage for "netscape" processes owned by fred
procmem
      - show memory usage for all processes (provided current user has
        superuser access privileges)
Definition of terms
  'Kbytes' is the total memory size of the process or file.
  'Resident' is that portion currently occupying physical memory.
  'Shared' is resident memory capable of being shared.
  'Private' is resident memory unique to this process or file.
  Resident = Shared + Private
Sizing
  For reporting purposes, the 'Shared' component has been counted once
  only while the 'Private' component has been summed for each process
  or file. The /usr/lib shared libraries have been reported separately
  since they tend to be widely used across applications. To be totally
  accurate, though, the shared component of these shared libraries
  should only be counted once across all applications, not once for
  every group of applications. The same logic may apply to other
  shared libraries also used by multiple applications.

The-v flag offers additional detail. An example ofprocmem output follows for all processes on a server running an Oracle database.

Code View:Scroll/Show All

Processes                                   Kbytes Resident   Shared  Private
---------                                   ------ --------   ------  -------
Process Summary (Count)
  -csh (3)                                    5376     2456     1064     1392
  -ksh (8)                                   14624     2912     1456     1456
  automountd (1)                              4088     3656     1792     1864
  cimomboot (1)                               1576     1384     1256      128
  cron (1)                                    1936     1744     1464      280
  devfsadmd (1)                               2776     2536     1592      944
  devfseventd (1)                             1272     1232      952      280
  dmispd (1)                                  3160     2648     1744      904
  dtlogin (1)                                 4920     2856     2192      664
  dwhttpd (2)                                20080     7440     4496     2944
  esd (2)                                    24712    21768     3880    17888
  grep (1)                                     968      936      832      104
  in.ndpd (1)                                 1856     1488     1304      184
  in.rdisc (1)                                1616     1416     1272      144
  in.rlogind (6)                             10368     2464     1360     1104
  inetd (1)                                   2648     2384     1384     1000
  init (1)                                    1888     1608     1136      472
  iostat (1)                                  1824     1784      904      880
  ksh (5)                                     9040     2320     1456      864
  lockd (1)                                   1896     1656     1264      392
  lpsched (1)                                 3040     1768     1552      216
  mibiisa (1)                                 2952     2752     1504     1248
  mountd (1)                                  2952     2536     1528     1008
  nfsd (1)                                    1888     1672     1264      408
  nscd (1)                                    3200     2928     1528     1400
  ora_ckpt_bench (1)                       1405264  1372488  1371824      664
  ora_dbw0_bench (1)                       1406976  1374208  1371824     2384
  ora_dbw1_bench (1)                       1406968  1374200  1371824     2376
  ora_dbw2_bench (1)                       1406968  1374200  1371824     2376
  ora_dbw3_bench (1)                       1406968  1374200  1371824     2376
  ora_lgwr_bench (1)                       1405248  1372472  1371824      648
  ora_pmon_bench (1)                       1405696  1372920  1371824     1096
  ora_reco_bench (1)                       1405160  1372384  1371824      560
  ora_smon_bench (1)                       1405192  1372424  1371824      600
  oraclebench (50)                        70258944  1400768  1371824    28944
  powerd (1)                                  1632     1576      976      600
  rpcbind (1)                                 2584     2088     1296      792
  sac (1)                                     1736     1488     1296      192
  sendmail (1)                                2936     2280     1816      464
  sh (4)                                      4176     1320      912      408
  snmpXdmid (1)                               3744     3184     2024     1160
  snmpdx (1)                                  2144     1920     1520      400
  statd (1)                                   2592     2208     1464      744
  syslogd (1)                                 4192     3008     1456     1552
  tail (1)                                     968      936      792      144
  tee (2)                                     1808      952      792      160
  tpccload (50)                             418800    38944     3344    35600
  ttymon (2)                                  3464     1744     1336      408
  utmpd (1)                                   1000      944      816      128
  vmstat (1)                                  1312     1280      808      472
  vold (1)                                    2696     2408     1768      640
  vxconfigd (1)                              14656    13944     1328    12616
-----------------------------------------------------------------------------
File (Count)                                Kbytes Resident   Shared  Private
-----------                                 ------ --------   ------  -------
/usr/lib Shared Library Totals              291936    27928     2856    25072
Other Shared Library Totals                1473544    29304     7248    22056
Mapped File Totals                             560      488      488        0
Binary File Totals                         1378632    18864    13328     5536
Shared Memory Totals                      80280128  1360688  1360688        0
Anonymous Memory Totals                      89680    84008        0    84008
-----------------------------------------------------------------------------
Grand Totals                              83514480  1521280  1384608   136672

Thebulk of the 1.5 Gbytes of resident memory used on this server isaccounted for by 1.3 Gbytes of shared memory, which also constitutesthe major component of the memory used for the Oracle processes.

The script can be used to report the physical and virtual memory consumption for a group of processes. For example,procmemora will report memory consumption for all processes that have the stringora in aps-efreport (Oracle processes typically meet this criterion). If anotherOracle user running the same applications is added to the system, youwould not expect an increase in theShared component of memory for/usr/lib shared libraries, other shared libraries, binary files, or shared memory segments. ThePrivatecomponent would be expected to grow, though, for the shared libraries,the mapped files, and anonymous memory. The additional private memoryrequired would probably be roughly equivalent to the private memorytotal for these applications divided by the current number of users.

Detail is available for all processes, as well as summaries for/usr/libshared libraries (which tend to be used by many processes throughout asystem and so should be counted only once for sizing purposes), othershared libraries (for example, Oracle shared libraries), mapped files(memory-mapped file system files), binary files (executable programs),shared memory segments, and anonymous memory (heap and stack).

Theprocmemscript will accurately show all memory directly used by processes, butnot memory belonging to UFS files that are resident in the file systempage cache. Since pages from UFS database files are not directly mappedinto the address spaces of database processes, they will not appear inthe totals. Thememps -m command frommemtool provides this information (it requires thememtool kernel module—installed when thememtool package is first set up—to be loaded).

What You Can Do to Reduce Paging

Ifyou are using file systems for your database files, the first step isto upgrade to Solaris 8 or else enable priority paging for earlierreleases of Solaris. If necessary, you could also consider include thefollowing steps to relieve memory pressures on your database server:

Add more memory to the system.
Use Direct I/O for database files.
Reduce the size of the database buffer cache. This reduction may result in additional database I/O, but that is almost always preferable to paging.
Remove applications from the server. If applications are running on the server, move them to a client system and run the applications in client/server mode. Memory should be freed up as a result.
Reduce the number of users on the system.

Create Bookmark (Key: b)Create Bookmark
Create Note or Tag (Key: t)Create Note or Tag
Download (Key: d)Download
Email This Page (Key: e)Email This Page
PrintPrint

Html View (Key: h)Html View
Zoom Out (Key: -)Zoom Out
Zoom In (Key: +)Zoom In
Toggle to Full Screen (Key: f)
Previous (Key: p)Previous
Next (Key: n)Next