PATROL? Knowledge Module ? for Unix V8.3 – Features and FAQs

来源:百度文库 编辑:神马文学网 时间:2024/04/27 22:28:31

PATROL® Knowledge Module for Unix V8.3 –Features and FAQs

  • Background
  • Why Read the Kernel?
  • There Has to Be a Catch
    • Initial development
    • Sustaining engineering
  • How Do I Know It's Right?
  • Data Availability
  • Targeted Usage
  • How Does It Work?
  • Is That All There Is?
    • Composite parameters
    • Log file scanning
    • Process presence
  • Summary
  • Frequently Asked Questions

Background

Since the release of V2.0 of PATROL, the PATROL Knowledge Module forUnix has used Unix command-line utilities as its primary source forreported system data. The concept of parsing human-readable text toextract the desired data was one of the principles on which PATROL wasdeveloped. This method has served customers well by providing aconvenient way to bring data from user-written utilities into theproduct for monitoring and history retention. While relatively easy toimplement, this technique is not the most efficient interface method,and if not carefully implemented can introduce undesirable overhead,particularly on larger systems. The market has for some time beenrequesting that PATROL collect Unix data by directly accessing the Unixkernel, rather than via command-line utilities.

The merger of BMC Software and BGS Systems in 1998 provided access to BEST/1®technology that extracts Unix operating system data directly from thekernels of the leading Unix vendors, and access to developers withknowledge and experience working with kernel readers. Immediatelyfollowing the merger, work began to interface the BEST/1- Collectcomponent from the BEST/1 product (now PATROL for Unix – Perform) withthe PATROL KM for Unix, eliminating the use of as manycommand-line utilities as possible. This work has been very successfuland is now incorporated in V8.3 of the PATROL KM for Unix component ofPATROL for Unix.

Why Read the Kernel?

There are two main benefits associated with direct kernel access.First, there is a performance improvement realized by eliminating theconversion of reported data from human-readable to machine-usableformat. Second, there is increased data integrity. A particularadvantage of using the PATROL for Unix – Perform collector (Performcollector) component is that it was developed for mathematical modelingof system resource utilization for lifecycle performance managementpurposes, which requires extremely accurate data. There have beenseveral cases where data perceived as inaccurate PATROL data was tracedto inaccurate output from the command-line operating system utilities.BMC Software developers encountered this when comparing data from thePerform collector to the output from command-line operating systemutilities. Through vendor contacts and independent research, the BMCSoftware developers have ensured that the Perform collector reportscorrect data.

There Has to Be a Catch

There are a number of issues surrounding direct kernel access thatshould be mentioned in any discussion of the topic. Two of these –initial development effort and sustaining effort – are primarily ofconcern to BMC Software. These issues weigh heavily on the decision toinvest in a kernel reader for a given platform. Two more issues – thediffi-culty in readily verifying the collected data and the fact thatnot all data required is avail-able directly from the kernel – concernboth BMC Software and end users.

Initial development

Most Unix vendors are not willing to share the source code for theirproduct, nor do they provide documented application programminginterfaces for accessing kernel data struc-tures. This leaves thekernel-reader developer with no choice but to reverse-engineer thekernel using data structure definitions (or inferences) gleaned from avariety of sources, including sysgenheader and library files. While most of the various commercial Uniximplementations share a common ancestor (SVR4) and many common exteriorfeatures, their internal compositions are very different. That meanseach vendor's kernel, and in some cases specific kernel versions fromthe same vendor, must be researched and ana-lyzed, and data retrievalmethods adapted for each one.

Sustaining engineering

Since the kernel-reader developer doesn't have access to the Unixsource, the only way to determine whether or not an existing collectordefinition is affected by a new vendor release is to test it, and thatpresents the first problem – logistics. It is nearly impossible toensure that a working kernel-reading collector will be availablesimultaneously with the vendor's release of a new operating systemversion. At a minimum, the existing collector must be thoroughlytested. More typically, the collector must be altered and retestedbefore being made generally available.

A more severe problem arises when a vendor issues alimited-distribution patch or hot-fix that affects a kernel structure.Depending on the nature of the change, a kernel-reading collector canremain unaffected, or may have a severe effect. A kernel change mayeven render the collector unusable until it is patched.



- Figure 1. PATROL KM for Unix block diagram

How Do I Know It's Right?

The "garbage in, garbage out" concept is widely understood, so it isonly natural for an administrator to want to verify that the data beingreported by the collector is accurate. The obvious answer is to run sar, or vmstat, and compare the numbers. Unfortunately, the way Unix is instrumented makes direct comparisons like this nearly impossible.

Unix performance is tracked internally using numerous counters, whichare incremented when a given criteria for that counter is met. Forexample, when a program performs a streamed write to a disk file, onlya logical write counter for the disk may be incre-mented if the desired file block is available in cache. The disk's physical write counter would not be incremented until the cache is flushed to disk.

All performance tools reporting rates (such as writes-per-second)periodically read the counters, calculate the difference from theprevious value, divide the difference by the time interval, andnormalize the result to a per-second rate. Therefore, for two tools toreport identical numbers, they would both have to read the same countersimultaneously, wait for precisely the same interval, and read thecounter again simultaneously. The reali-ties of a multitaskingoperating system yield an extremely low probability of this occur-ring.The disparity can be even greater if one of the tools is launched adhoc. If, for example, the Perform collector posts a data point foroverall CPU use of 60% and vmstat(1) (as in vmstat 5 2) is run in anattempt to confirm that number, the operating system utility can onlyreport on the snapshot usage over the 10-second interval it runs, butthe Perform collector may have calculated 60% by averaging the usageover the last three minutes. Imag-ine that the 10-second snapshothappens to span a 98% (or 10%) utilization spike (lull), and theproblem becomes clear.

On the other hand, data collected over a period of time by independenttools' sampling at similar intervals should exhibit similar trends.Long-term trending, then, is the only valid means of checking collectoraccuracy in the field.

Data Availability

A typical Unix implementation includes optional subsystems that are not part of the Unix kernel, such as the lp(print spooler) and NFS subsystems. Since instrumentation data fromthese subsystems isn't kept in kernel data structures, the PATROL KMfor Unix will have to continue to rely on lpstat and nfsstat for information about them. There may be addi-tional utility dependencies on specific platforms or versions.

Targeted Usage

Careful consideration of the issues in the previous discussion of theadvantages and diffi-culties of direct kernel access has led to thedecision to use kernel readers as the data source for the PATROL KM forUnix only on platforms that make up the bulk of PATROL installations.Those platforms are AIX, Compaq Tru64, HP-UX and Sun Solaris (ReliantUnix support is planned for late 2000). The remaining platformssupported by PATROL will continue to rely on Unix operating systemutilities for all reported data.

Debugging potential problems may be more time-consuming than in thepast. If collection problems are identified, fixes must come from BMCSoftware. It will no longer be possi-ble to simply edit a PATROL ScriptLanguage to pick up a different field, or to divide by a differentscaling factor to correct a problem. Conversely, PATROL will no longerbe affected by incorrect data reported by operating system command-lineutilities. If a col-lected value is wrong, the Perform collector can bemodified to correct it.

The Perform collector was developed independently from PATROL and isbeing used by PATROL in exactly the same form as it is used by PATROLfor Unix – Perform. This presents the opportunity for the two productsto share a single data source (if the installed PATROL KM and PATROLfor Unix – Perform versions are compatible), but it also requires thatthe two tools report on the same set of metrics. In reconciling thedifferences, some parameters have been deleted from the PATROL KM, somehave been added to the Perform collector, and others are being derivedfor the PATROL KM from related Perform collector data. An example ofone of these tradeoffs is that while fewer SMP parameters are reportedon Solaris, the same SMP parameters are now reported on Solaris, HP-UX,AIX and Tru64.

It also means that a certain degree of trust in the product isrequired. Extensive debug mes-saging has been included in the PATROLKM. If analysis of the debug data indicates that PATROL is correctlyreporting data from the kernel and that data can be shown to followtrends similar to data from operating system utilities, then the PATROLdata must be accepted as accurate even though it may disagree withspecific numbers from operating system utilities.

How Does It Work?

A new process called the Data Collection Manager (DCM) serves as thecontrol and data inter-face between the PATROL KM and the Performcollector. A block diagram illustrating the relationships of thevarious components is shown in Figure 1.

Note that shared memory is the default communication method betweenPerform collector and DCM. While the Perform collector may beconfigured to use memory-mapped files for communication, thatconfiguration is not supported in this release of the PATROL KM forUnix. The Custom DLL block in Figure 1 represents specialized softwareintegrated into the PATROL Agent to improve performance when handlingthe large amount of data typ-ically returned by the Perform collector.

Is That All There Is?

While integration of the Perform collector is arguably the mostsignificant new feature in this version of the PATROL KM, it is by nomeans the only one.

Composite parameters

This feature was introduced in 1998 in V3.5 of the PATROL KnowledgeModule for Microsoft Windows NT and has been replicated in this versionof the PATROL KM for Unix. It allows a user to define Booleancombinations of PATROL parameters and constants in anexpression-builder dialog to create a new logical parameter. Forexample, CPU > 90 AND PageOuts > 1 could be used to warn of aphysical memory deficit. Where these are defined, it usually makessense to deactivate the alarm ranges on the individual parameters. TheComposite Parameters enhancement is available on all supported Unixvariants.

Log file scanning

This enhancement to the existing LOGS application allows the user tospecify Unix regu-lar expressions that, if detected in a specified logfile, result in a user-defined alert state (WARN or ALARM). More thanone expression can be defined for each file, and provi-sion is made forviewing the last Nlines of the file to assist in identifying the actual mes-sage text.This functionality can be applied directly to ASCII files and toencoded (binary) files, provided a filter or dump utility is available.In the binary case, the file is periodi-cally dumped to ASCII, theASCII file is scanned, and then deleted. An example of this would beusing fwtmp to periodically dump /etc/btmp to monitor forfailed root login attempts. The scanning function is implemented in adaemon executable for efficiency. This enhancement is available on allsupported Unix variants.

Process presence

In place of the removed ACTIVE_PROCESS application class, thisenhancement to the PROCESS application allows users to define textstrings. The specification is done by defining a search string found inthe command-line text of the process to be monitored. When a process isfound to have the search string in its command text, PATROL beginsmonitoring that process. There are PATROL parameters to track theprocess' CPU and memory use, as well as the number of processes wherethe string was found. By setting alarm ranges on the PROCPPCountparameter, the user can define the alert level (WARN or ALARM) thatresults from the process no longer being found in the process list, orfrom the occurrence count falling below the desired number. Forexample, this feature can be used to alert whether init fails to restart a daemon or whether the number of biodpro-cesses falls below the desired number. By setting the parameteralarm range to WARN or ALARM at 1, it is also possible to alert when aconfigured process exists (starts). Since the Process Presenceenhancement relies on data collected by the Perform collector, it isonly available on Sun Solaris, HP-UX, AIX and Tru64 (Reliant Unix inlate 2000).

Summary

Version 8.3 of the PATROL KM for Unix is a significant technologicaland product inte-gration milestone for PATROL and BMC Software. Itaddresses issues that have been per-ceived for a number of years asdeficiencies in PATROL. It is also very different in some fundamentalways from any other PATROL KM, and thus requires some adjustments inthe approaches taken for implementation and troubleshooting.

About BMC Software

One of the world's largest independent software vendors, BMC Softwaredelivers the most comprehensive e-business systems management softwarewith the fastest guaranteed implementation. This Service Assurancestrategy enhances the availability, performance and recoverability ofcompanies' business-critical applications. Companies can use thismanagement methodology to demonstrate their ability to deliver optimalservice to their customers and partners by joining BMC Software OnSite,a certification program which includes solution implementation andregular HealthChecks performed by BMC Software Professional Services.

BMC Software is a Forbes 500 company and a member of the S&P 500,with fiscal year 2000 revenues exceeding $1.7 billion. The company isheadquartered in Houston, Texas, with offices worldwide.

Frequently Asked Questions

Q: Can the PATROL KM for Unix and PATROL for Unix – Perform share a common Perform collector process?

A: Yes, if both installations use the same version of thePerform collector (V6.3 with KM version 8.3.00 and V6.5 with KM version8.3.01). See PATROL KM for Unix User Guide for details.

Q: Why are parameters from previous versions no longer available?

A: Their data is not available from the Perform collector. Ifthere is a particular parameter you consider vital, please submit anenhancement request.

Q: Can I run the PATROL KM for Unix V8.3 on some systems and an older version on other systems?

A: Only if the two groups of systems connect to different PATROLConsoles. The PATROL KM definitions must correspond on the PATROLConsole and PATROL Agent, and only one version can be loaded on thePATROL Console, so the two environments must remain isolated.

Q: I am not a PATROL for Unix – Perform customer. How do I get the Perform collector?

A: Everything required to support the PATROL KM (Performcollector, DCM, PATROL KM and PSL files, etc.) is packaged andinstalled with the PATROL for Unix – Perform product.

Q: How does the Perform collector process get started?

A: Code in the DCM determines whether to connect to an existingcollector process or start a new one. It is transparent to the user.