[Cockcroft98] Section 15.1——15.6

来源:百度文库 编辑:神马文学网 时间:2024/04/29 03:54:37

Chapter 15. Metric Collection Interfaces

Thereare many sources of performance information in Solaris; this chaptergives detailed examples that show how to get at all the available data.

Standards and Implementations

Thereare well-defined, portable standards for the many Unix programminginterfaces. The System V Interface Definition, POSIX, and X/Open Unix95all tie down various aspects of the system so that application programscan be moved among systems by recompilation. Beneath these interfaceslie implementations that have a common ancestry but that are divergingrapidly as each Unix vendor seeks to gain a feature, performance, orscalability advantage. Although the interface is well defined, theperformance-oriented behavior and limitations are not closely defined.For management of the performance of an implementation,implementation-specific metrics are collected and provided toperformance management tools. Some of these metrics, especially thoserooted in physical operations like network packet counts, are welldefined and portable. Other metrics vary according to the underlyinghardware, for example, collision counts, which are under-reported bysome Ethernet interfaces and don’t exist for FDDI. Some metrics monitorsoftware abstractions, such as transmit buffer availability, that canvary from one operating system to another or from one release ofSolaris to the next.

Bytheir nature, performance management applications must be aware of themetrics and behavior of the underlying implementation. This necessityforces performance management applications to be among the leastportable of all software packages. The problem is compounded by lack ofstandardization of the metric collection interfaces and common metricdefinitions. On top of all that, each vendor of performance toolscollects a different subset of the available metrics and stores them ina proprietary format. It is very hard to use one vendor’s tool toanalyze or browse data collected with a different vendor’s tool.

The X/Open Universal Measurement Architecture Standard

TheX/Open UMA standard attempts to define a commonality for metricsmeasurements: the UMA Data Pool Definition (DPD), a common set ofmetrics; the UMA Data Capture Interface (DCI), a common collectioninterface; and the UMA Measurement Layer Interface (MLI), a common wayfor applications to read data.

Ipersonally worked on the DPD to make sure that it contains portable,well-defined, and useful metrics. However, I think the DPD metrics aretoo low level, based mainly on the raw kernel counters; it would bebetter to collect and store per-second rates for many metrics togetherwith higher-level health and status indicators that would be far moreportable. Still, if you ever decide to build a tool from scratch, theData Pool Definition does summarize what is common across all Uniximplementations and a set of extensions to raise the bar somewhat. Thenearest thing to a full implementation of the final Data PoolDefinition is a prototype for Solaris written in SE, which Graham Hazelwrote for me.

Iam less keen on the two interfaces—they seem more complex than theyshould be. While there are reference implementations of both, thereference DCI is implemented on a Hitachi version of OSF/1, which doesnot have an MLI to go with it, and the reference MLI is implemented inthe Amdahl AUMA product, which does not use a DCI interface and isbased on a prerelease of the standard. I personally think that aninterface based on a Java Remote Method Invocation (RMI) would be moreportable and useful than the DCI and MLI.

Inthe end, though, there has not been enough interest or demand to givevendors a good business case to make the difficult move from their ownformats and interfaces.

The Application Resource Measurement Standard

TheARM standard aims to instrument application response times. The hope isthat by tagging each ARM call with a transaction identifier andtracking these transactions as they move from system to system, we canmeasure end-to-end response time. When a user transaction slows downfor whatever reason, the system in the end-to-end chain of measurementsthat shows the biggest increase in response time can be pinpointed.

Thisplan sounds very useful, and there is both good news and bad news aboutARM. The good news is that all vendors support the one standard, andseveral implementations exist, from HP and Tivoli (who jointly inventedthe standard) and, more recently, BGS. The bad news is that to measureuser response time, application code must be instrumented andsophisticated tools are needed to handle the problems of distributedtransaction tracking. There does seem to be some interest from theapplication vendors, and over time, more measurements will becomeavailable.

Toencourage this effort, tell your application and database vendors thatyou want them to instrument their code according to ARM so that you canmanage user response times more effectively.

Solaris 2 Native Metric Interfaces

Solaris2 provides many interfaces to supply performance and status metrics.The SE Toolkit provides a convenient way to get at the data and use it,but if you want to implement your own measurement code, you need toknow how to program to the interfaces directly from C code. RichardPettit, the author of the SE language, contributed the followingsections that explain how to program to the “kvm,” “kstat,” “mib”, and “ndd” interfaces used by SE. I added information on processes, trace probes, therstat protocol, and configuration information.

The Traditional Kernel Memory Interface

Thekvmlibrary is the legacy interface for accessing kernel data in SunOS.Although it was available in SunOS 4.x, it is still not a librarywidely used by other Unix operating systems. The name stands for“kernel virtual memory,” which provides the data on which the libraryoperates. This data takes the form of variables that provide feedbackregarding the state of the operating system. From this data, you canextrapolate information regarding the relative performance of thecomputer. Performance analysis applications such asvmstat originally used this interface.

Thekvmlibrary provides a robust interface to accessing data within theaddress space of an operating system. This access includes a runningoperating system or the disk image of a dump of a running kernel, suchas the result of a system crash. The files from which this data is readinclude the “core file” and the “swap file”; these files accommodatethe situations when data to be read is no longer present in physicalmemory but has been written to the swap file, as is the case when theuser area (u-area) of an application program is read—one of thecapabilities of the kvm library.

On a system without a kvm library, you can create a simplified version of the library by opening the file/dev/kmem,which is a character special file that provides user-level access tokernel memory. You can retrieve symbols and their addresses withinkernel memory by searching the name list of the operating system. Then,you can use the address of a variable as an offset to thelseek system call once /dev/kmem has been opened. The value of the variable can then be retrieved by theread system call. Data can also be written by means of the writesystem call, although this capability is quite dangerous and you shouldavoid using it. Instead, use system utilities that are designedspecifically for modifying tunable parameters in the kernel.

Anapplication program needs to know about the type and size of thevariable being read from the kernel. Thus, this type of program isnonportable—not only to other Unix systems but even between releases ofSolaris. The specifics of the internals of the operating system are notguaranteed to remain the same between releases. Only the documentedinterfaces such as the Device Driver Interface and Driver KernelInterface (DDI/DKI) are obliged to remain consistent across releases.

The “kvm”interface exposes the kernel data types to the application. When a64-bit address space capability is added to Solaris, the kernel datatypes will include 64-bit addresses, and applications that still wantto accesskvm in this environment will have to be compiled in 64-bit mode. These applications include many performance tool data collectors.

Symbols

Variablesin the kernel are the same as those in an application program. Theyhave a type, a size, and an address within the memory of the runningprogram, this program being the operating system. The library call forextracting this information from a binary isnlist. The declaration fornlist fromnlist.h is

extern int nlist(const char *namelist, struct nlist *nl);

The first parameter is the path name of a binary to be searched for symbols. The second parameter is an array of structnlist, which contains the names of the symbols to be searched for. This array is terminated with a name value of zero.

Figure 15-1 shows the declaration of thenlist structure fromnlist.h. Figure 15-2 is an example declaration of an array ofnlist structures to be passed to thenlist library call.

Figure 15-1. The nlist Structure Declaration
struct nlist {
char *n_name; /* symbol name */
long n_value; /* value of symbol */
short n_scnum; /* section number */
unsigned short n_type; /* type and derived type */
char n_sclass; /* storage class */
char n_numaux; /* number of aux. entries */
};

Note that in Figure 15-2, the only member of the aggregate that has an initialization value is then_name member. This is the only a priori value and is all that is required for the library call to do its job.

Figure 15-2. Example Declaration Ready for nlist Call
static struct nlist nl[] = {
{"maxusers"},
{ 0 }
};

Once the call tonlist is successful, then_value member will contain the address of the variable that can be used as the offset value to thelseek system call. Figure 15-3 shows the example call tonlist.

Figure 15-3. Call to nlist
if (nlist("/dev/ksyms', nl) == -1) {
perror("nlist");
exit(1);
}

In Figure 15-3,the name of the binary specified is actually a character special filethat allows access to an ELF (executable and linking format) memoryimage containing the symbol and string table of the running kernel.This interface allows access to symbols of loadable device driverswhose symbols would not appear in the/kernel/genunix binary but are present in the running kernel. To illustrate, create a local version of the kernel symbols by copying withdd and inspect the resulting binary (this example used Solaris x86) by using thefile andnm commands.

Code View:Scroll/Show All
#  dd if=/dev/ksyms of=ksyms.dump bs=8192 
# file ksyms.dump
ksyms.dump: ELF 32-bit LSB executable 80386 Version 1, statically linked, not
stripped
# /usr/ccs/bin/nm ksyms.dump | egrep '(Shndx|maxusers)'
[Index] Value Size Type Bind Other Shndx Name
[6705] |0xe0518174|0x00000004|OBJT |GLOB |0 |ABS |maxusers


The value field of0xe0518174 represents the address of themaxusers variable and the offset value passed tolseek. You will see this again in the complete example program.

Extracting the Values

Wehave laid the foundation for retrieving the data from kernel memory,and now we can undertake the business of seeking and reading. First, weneed a file descriptor. Opening/dev/kmemis not for the average user, though. This character special file isreadable only by root or the group sys. And only root may write to/dev/kmem. Figure 15-4 shows the complete demo program.

Figure 15-4. Accessing Kernel Data Without thekvm Library
Code View: Scroll / Show All
/* compile with -lelf */ 

#include
#include
#include
#include
#include

main()
{
int fd;
int maxusers;
static struct nlist nl[] = {
{ "maxusers"},
{ 0 }
};

/* retrieve symbols from the kernel */
if (nlist("/dev/ksyms", nl) == -1) {
perror("nlist");
exit(1);
}
/* open kernel memory, read-only */
fd = open("/dev/kmem", O_RDONLY);
if (fd == -1) {
perror("open");
exit(1);
}
/* seek to the specified kernel address */
if (lseek(fd, nl[0].n_value, 0) == -1) {
perror("lseek");
exit(1);
}
/* read the value from kernel memory */
if (read(fd, &maxusers, sizeof maxusers) == -1) {
perror("read");
exit(1);
}
/* print and done */
printf("maxusers(0x%x) = %d\n", nl[0].n_value, maxusers);
close(fd);
exit(0);
}


The read system call provides the address of themaxuserslocal variable as the buffer into which the value of kernel variable isread. The size provided is the size of the variable itself. When thisprogram is compiled and run, the output is:

 maxusers(0xe0518174) = 16

Once again, the address0xe0518174 appears. This value is the same as that provided by thenm command when it is searching the localksyms file.

Now we can translate the logic followed by this program into the equivalent functionality by using thekvm library.

Usinglibkvm

Thefacilities presented in the previous example program are the low-levelmechanisms for accessing data within the running kernel. This same codecan be used on other Unix systems to retrieve data from the runningkernel, although the kernel variables will obviously be different. Thekvm library removes some of the system knowledge from the application program, so the coding of/dev/ksyms and/dev/kmem into the program are not necessary. The library also has all of the necessary character special files open and kept within thekvm “cookie” so that all necessary accesses can be accomplished by passing this cookie to thekvm functions.

The initial step is to call the open function,kvm_open. This function can be passed a number of null values for path names. When null values are passed,kvm_open will use default values that represent the currently running system. Remember that thekvm library can also be used to examine crash dumps, so you can specify akmem image, such as a crash dump image, instead of the running system image in/dev/kmem. The prototype for thekvm_open function with parameter names and comments added is:

extern kvm_t *kvm_open(char *namelist, // e.g., /dev/ksyms 
char *corefile, // e.g., /dev/kmem
char *swapfile, // the current swap file
int flag, // O_RDONLY or O_RDWR
char *errstr); // error prefix string

Theswapfile parameter is useful only when you are examining a running system and then only if thekvm_getu function is used to examine the u-area of a process.

An example of the use ofkvm_open with the default values is shown in Figure 15-5.

Figure 15-5. Example Use ofkvm_open with Default Values
kvm_t   *kd; 

/* open kvm */
kd = kvm_open(0, 0, 0, O_RDONLY, 0);
if (kd == 0) {
perror("kvm_open");
exit(1);
}

Withthis usage, the values for the running system become implementationdependent and are not included in the application program. The cookiereturned bykvm_open is reused for all of thekvm library functions, and among the data it contains are file descriptors for the core file and swap file.

Thekvm_nlist function looks the same as thenlist function with the exception that it uses thekvm cookie instead of a file descriptor. The return value is the same and the resultingnlist structure is the same. Figure 15-6 shows the call tokvm_nlist.

Figure 15-6. Call tokvm_nlist
/* use the kvm interface for nlist */ 
if (kvm_nlist(kd, nl) == -1) {
perror("kvm_nlist");
exit(1);
}

The final step is the use ofkvm_kread. This example differs from the non-kvmexample program in that it involves no explicit seek. Instead, theaddress of the kernel variable is included as a parameter to thekvm_kread call.kvm_kread is responsible for seeking to the correct offset and reading the data of the specified size. Figure 15-7 shows the call tokvm_kread, and Figure 15-8 shows the example program in its entirety.

Figure 15-7. Call tokvm_kread
/* a read from the kernel */ 
if (kvm_kread(kd, nl[0].n_value,
(char *) &maxusers, sizeof maxusers) == -1) {
perror("kvm_kread");
exit(1);
}

Figure 15-8. Simple Application That Uses thekvm Library
Code View: Scroll / Show All
/* compile with -lkvm -lelf */ 

#include
#include
#include
#include

main()
{
kvm_t*kd;
int maxusers;
static struct nlist nl[] = {
{ "maxusers" },
{ 0 }
};

/* open kvm */
kd = kvm_open(0, 0, 0, O_RDONLY, 0);
if (kd == 0) {
perror("kvm_open");
exit(1);
}
/* use the kvm interface for nlist */
if (kvm_nlist(kd, nl) == -1) {
perror("kvm_nlist");
exit(1);
}
/* a read from the kernel */
if (kvm_kread(kd, nl[0].n_value,
(char *) &maxusers, sizeof maxusers) == -1) {
perror("kvm_kread");
exit(1);
}
/* print and done */
printf("maxusers(0x%x) = %d\n", nl[0].n_value, maxusers);
kvm_close(kd);
exit(0);
}


As expected, the output of this program is the same as that of the previous one. Runningtruss on this program will show some extra work being done because thekvm library assumes that any of thekvmfunctions may be called and a search of the swap file may be required.Nonetheless, this example, although still desperately lacking inportability because of the low-level nature of the subject matter, ismore portable across releases of Solaris since the details ofinterfacing to the operating system are left to the library.

Procs and U-Areas

The reason thekvmlibrary opens the swap area is so user application images can beaccessed as well as the kernel image. The kernel cannot be swapped out,but the user-level applications can. If an application does get swappedout, thekvm library can find the image in the swap area.

The second example program used thekvm_kread call to read the kernel memory. You can use thekvm_uread call to read user memory. In order for this command to work, make a call to thekvm_getu function so that thekvm library knows what u-area to read from. Thekvm_getu function needs astruct proc * as an argument, so call thekvm_getproc function first. The prototypes for these functions are:

extern struct proc *kvm_getproc(kvm_t *kd, int pid); 
extern struct user *kvm_getu(kvm_t *kd, struct proc *procp);

The third example program uses a combination of calls from the first two programs as well as thekvm_getproc andkvm_getu functions. The example shows how to read the address space of a user application, using thekvm library.

Since the address space of a user application rather than the kernel will be read from, we cannot use thekvm_nlistfunction since it searches the name list of the kernel. The name listof the user application needs to be searched, so we use thenlist function directly. Figure 15-9 shows the third example.

Figure 15-9. Use of thekvm_getproc and kvm_getu Functions
Code View: Scroll / Show All
/* compile with -lkvm -lelf */ 

#include
#include
#include
#include
#include
#include

intglobal_value = 0xBEEFFACE;

main()
{
int local_value;
kvm_t*kd;
pid_tpid = getpid();
struct proc *proc;
struct user *u;
static struct nlist nl[] = {
{ "global_value" },
{ 0 }
};

/* open kvm */
kd = kvm_open(0, 0, 0, O_RDONLY, 0);
if (kd == 0) {
perror("kvm_open");
exit(1);
}
/* use nlist to retrieve symbols from the a.out binary */
if (nlist("a.out", nl) == -1) {
perror("nlist");
exit(1);
}
/* get the proc structure for this process id */
proc = kvm_getproc(kd, pid);
if (proc == 0) {
perror("kvm_getproc");
exit(1);
}
/* get the u-area */
u = kvm_getu(kd, proc);
if (u == 0) {
perror("kvm_getu");
exit(1);
}
/* read from the u-area */
if (kvm_uread(kd, nl[0].n_value,
(char *) &local_value, sizeof local_value) == -1) {
perror("kvm_uread");
exit(1);
}
/* print and done */
printf("global_value(0x%x) = 0x%X\n",
nl[0].n_value, local_value);
kvm_close(kd);
exit(0);
}


The output of the program is

global_value(0x8049bbc) = 0xBEEFFACE

For further verification of the value, we use thenm command to look up the value ofglobal_value in thea.out file, yielding

[Index]   Value       Size   Type  Bind  Other Shndx Name 
[56] |0x08049bbc|0x00000004|OBJT |GLOB |0 |17 |global_value

which shows the value field as0x08049bbc, the same as printed out by the example program.

Other Functions

The remaining functions in thekvm library concern the traversal and viewing of information about processes on the system. The process traversal functions arekvm_setproc andget_nextproc.kvm_getproc is also in this category but has been covered specifically by the third example program. Prototypes for these functions are:

extern int kvm_setproc(kvm_t *kd); 
extern struct proc *kvm_nextproc(kvm_t *kd);
extern int kvm_getcmd(kvm_t *kd, struct proc *procp, struct user *up, char
***argp, char ***envp);

Thekvm_setproc function acts as a rewind function for setting the logical “pointer” to the beginning of the list of processes. Thekvm_nextproc function simply retrieves the next process on this logical list.

The functionkvm_getcmd retrieves the calling arguments and the environment strings for the process specified by the parameters to the function.

These functions can be used together to build an application that works much like theps command. The fourth example in Figure 15-10 shows how to do this.

Figure 15-10. ofkvm_nextproc and kvm_getcmd
Code View: Scroll / Show All
/* compile with -D_KMEMUSER -lkvm -lelf */ 

#include
#include
#include
#include
#include
#include
main()
{
char**argv = 0;
char**env = 0;
int i;
kvm_t*kd;
struct pid pid;
struct proc *proc;
struct user *u;

/* open kvm */
kd = kvm_open(0, 0, 0, O_RDONLY, 0);
if (kd == 0) {
perror("kvm_open");
exit(1);
}
/* "rewind" the get-process "pointer" */
if (kvm_setproc(kd) == -1) {
perror(“kvm_setproc”);
exit(1);
}
/* get the next proc structure */
while((proc = kvm_nextproc(kd)) != 0) {
/* kvm_kread the pid structure */
if (kvm_kread(kd, (unsigned long) proc->p_pidp,
(char *) &pid, sizeof pid) == -1) {
perror("kvm_kread");
exit(1);
}
/* reassign with the user space version */
proc->p_pidp = &pid;

/* get the u-area for this process */
u = kvm_getu(kd, proc);
if (u == 0) {
/* zombie */
if (proc->p_stat == SZOMB)
printf("%6d \n",
proc->p_pidp->pid_id);
continue;
}
/* read the command line info */
if (kvm_getcmd(kd, proc, u, &argv, &env) == -1) {
/* insufficient permission */
printf("%6d %s\n",
proc->p_pidp->pid_id, u->u_comm);
continue;
}
/* got what we need, now print it out */
printf("%6d ", proc->p_pidp->pid_id);
for(i=0; argv && argv[i]; i++)
printf("%s ", argv[i]);
/* need to free this value, malloc'ed by kvm_getcmd */
free(argv);
argv = 0;

/* now print the environment strings */
for(i=0; env && env[i]; i++)
printf("%s ", env[i]);
/* need to free this value, malloc'ed by kvm_getcmd */
free(env);
env = 0;
putchar('\n');
}
/* done */
kvm_close(kd);
exit(0);
}


Caveats

Extractingdata from a running kernel can present problems that may not be evidentto the uninitiated. Here are some of the issues that may arise duringuse of thekvm library.

  • Permission

    The files that need to be opened within the kvm library have permissions that permit only specific users to access them. If the kvm_open call fails with “permission denied,” you’ve run into this situation.

  • 64-bit pointers in future releases

    If the kernel is using 64-bit pointers, so must a user of the kvm library. See “64-bit Addressing” on page 138.

  • Concurrency

    Since the SunOS 5.x kernel is a multithreaded operating system, data structures within the kernel have locks associated with them to prevent the modification of an area of memory in parallel by two or more threads. This locking is in contrast to the monolithic operating system architecture that used elevated or masked processor interrupt levels to protect critical regions.

    This issue illuminates a shortcoming of accessing kernel memory from the user space as demonstrated by these examples. There is no mechanism whereby a user-level application can lock a data structure in the kernel so that it is not changed while being accessed and thereby cause the read to yield inaccurate data. This is, of course, a necessary precaution since a user-level application could lock a kernel data structure indefinitely and cause deadlock or starvation.

  • Pointers within data

    A common error in reading kernel data with kvm is to assume when reading structures with kvm_kread that all associated data has been read with it. If the structure read with kvm_kread contains a pointer, the data that it points at is not read. It must be read with another call to kvm_kread. This problem will manifest itself quickly with a segmentation violation. Figure 15-11 shows how to read the platform name from the root node of the devinfo tree.

Figure 15-11. Pointers Within Pointers
Code View: Scroll / Show All
/* compile with -lkvm -lelf */ 

#include
#include
#include
#include
#include
#include
#include
#include

main()
{
char name_buf[BUFSIZ];
kvm_t *kd;
caddr_t top_devinfo;
struct dev_inforoot_node;
static struct nlist nl[] = {
{ "top_devinfo" },
{ 0 }
};
/* open kvm */
kd = kvm_open(0, 0, 0, O_RDONLY, 0);
if (kd == 0) {
perror("kvm_open");
exit(1);
}
/* get the address of top_devinfo */
if (kvm_nlist(kd, nl) == -1) {
perror("kvm_nlist");
exit(1);
}
/* read the top_devinfo value, which is an address */
if (kvm_kread(kd, nl[0].n_value,
(char *) &top_devinfo, sizeof top_devinfo) == -1) {
perror("kvm_kread: top_devinfo");
exit(1);
}
/* use this address to read the root node of the devinfo tree */
if (kvm_kread(kd, (unsigned long) top_devinfo,
(char *) &root_node, sizeof root_node) == -1) {
perror("kvm_kread: root_node");
exit(1);
}
/* the devinfo structure contains a pointer to a string */
if (kvm_kread(kd, (unsigned long) root_node.devi_binding_name,
name_buf, sizeof name_buf) == -1) {
perror("kvm_kread: devi_binding_name");
exit(1);
}
/* finally! */
puts(name_buf);
kvm_close(kd);
exit(0);
}

The Traditional Kernel Memory Interface

The kvm library is the legacy interface for accessing kernel data in SunOS. Although it was available in SunOS 4.x, it is still not a library widely used by other Unix operating systems. The name stands for “kernel virtual memory,” which provides the data on which the library operates. This data takes the form of variables that provide feedback regarding the state of the operating system. From this data, you can extrapolate information regarding the relative performance of the computer. Performance analysis applications such as vmstat originally used this interface.

The kvm library provides a robust interface to accessing data within the address space of an operating system. This access includes a running operating system or the disk image of a dump of a running kernel, such as the result of a system crash. The files from which this data is read include the “core file” and the “swap file”; these files accommodate the situations when data to be read is no longer present in physical memory but has been written to the swap file, as is the case when the user area (u-area) of an application program is read—one of the capabilities of the kvm library.

On a system without a kvm library, you can create a simplified version of the library by opening the file /dev/kmem, which is a character special file that provides user-level access to kernel memory. You can retrieve symbols and their addresses within kernel memory by searching the name list of the operating system. Then, you can use the address of a variable as an offset to the lseek system call once /dev/kmem has been opened. The value of the variable can then be retrieved by the read system call. Data can also be written by means of the write system call, although this capability is quite dangerous and you should avoid using it. Instead, use system utilities that are designed specifically for modifying tunable parameters in the kernel.

An application program needs to know about the type and size of the variable being read from the kernel. Thus, this type of program is nonportable—not only to other Unix systems but even between releases of Solaris. The specifics of the internals of the operating system are not guaranteed to remain the same between releases. Only the documented interfaces such as the Device Driver Interface and Driver Kernel Interface (DDI/DKI) are obliged to remain consistent across releases.

The “ kvm” interface exposes the kernel data types to the application. When a 64-bit address space capability is added to Solaris, the kernel data types will include 64-bit addresses, and applications that still want to access kvm in this environment will have to be compiled in 64-bit mode. These applications include many performance tool data collectors.

Symbols

Variables in the kernel are the same as those in an application program. They have a type, a size, and an address within the memory of the running program, this program being the operating system. The library call for extracting this information from a binary is nlist. The declaration for nlist from nlist.h is

extern int nlist(const char *namelist, struct nlist *nl);

The first parameter is the path name of a binary to be searched for symbols. The second parameter is an array of struct nlist, which contains the names of the symbols to be searched for. This array is terminated with a name value of zero.

Figure 15-1 shows the declaration of the nlist structure from nlist.h. Figure 15-2 is an example declaration of an array of nlist structures to be passed to the nlist library call.

Figure 15-1. The nlist Structure Declaration
struct nlist {
char *n_name; /* symbol name */
long n_value; /* value of symbol */
short n_scnum; /* section number */
unsigned short n_type; /* type and derived type */
char n_sclass; /* storage class */
char n_numaux; /* number of aux. entries */
};

Note that in Figure 15-2, the only member of the aggregate that has an initialization value is the n_name member. This is the only a priori value and is all that is required for the library call to do its job.

Figure 15-2. Example Declaration Ready for nlist Call
static struct nlist nl[] = {
{"maxusers"},
{ 0 }
};

Once the call to nlist is successful, the n_value member will contain the address of the variable that can be used as the offset value to the lseek system call. Figure 15-3 shows the example call to nlist.

Figure 15-3. Call to nlist
if (nlist("/dev/ksyms', nl) == -1) {
perror("nlist");
exit(1);
}

In Figure 15-3, the name of the binary specified is actually a character special file that allows access to an ELF (executable and linking format) memory image containing the symbol and string table of the running kernel. This interface allows access to symbols of loadable device drivers whose symbols would not appear in the /kernel/genunix binary but are present in the running kernel. To illustrate, create a local version of the kernel symbols by copying with dd and inspect the resulting binary (this example used Solaris x86) by using the file and nm commands.

Code View: Scroll / Show All
#  dd if=/dev/ksyms of=ksyms.dump bs=8192 
# file ksyms.dump
ksyms.dump: ELF 32-bit LSB executable 80386 Version 1, statically linked, not
stripped
# /usr/ccs/bin/nm ksyms.dump | egrep '(Shndx|maxusers)'
[Index] Value Size Type Bind Other Shndx Name
[6705] |0xe0518174|0x00000004|OBJT |GLOB |0 |ABS |maxusers


The value field of 0xe0518174 represents the address of the maxusers variable and the offset value passed to lseek. You will see this again in the complete example program.

Extracting the Values

We have laid the foundation for retrieving the data from kernel memory, and now we can undertake the business of seeking and reading. First, we need a file descriptor. Opening /dev/kmem is not for the average user, though. This character special file is readable only by root or the group sys. And only root may write to /dev/kmem. Figure 15-4 shows the complete demo program.

Figure 15-4. Accessing Kernel Data Without the kvm Library
Code View: Scroll / Show All
/* compile with -lelf */ 

#include
#include
#include
#include
#include

main()
{
int fd;
int maxusers;
static struct nlist nl[] = {
{ "maxusers"},
{ 0 }
};

/* retrieve symbols from the kernel */
if (nlist("/dev/ksyms", nl) == -1) {
perror("nlist");
exit(1);
}
/* open kernel memory, read-only */
fd = open("/dev/kmem", O_RDONLY);
if (fd == -1) {
perror("open");
exit(1);
}
/* seek to the specified kernel address */
if (lseek(fd, nl[0].n_value, 0) == -1) {
perror("lseek");
exit(1);
}
/* read the value from kernel memory */
if (read(fd, &maxusers, sizeof maxusers) == -1) {
perror("read");
exit(1);
}
/* print and done */
printf("maxusers(0x%x) = %d\n", nl[0].n_value, maxusers);
close(fd);
exit(0);
}


The read system call provides the address of the maxusers local variable as the buffer into which the value of kernel variable is read. The size provided is the size of the variable itself. When this program is compiled and run, the output is:

 maxusers(0xe0518174) = 16

Once again, the address 0xe0518174 appears. This value is the same as that provided by the nm command when it is searching the local ksyms file.

Now we can translate the logic followed by this program into the equivalent functionality by using the kvm library.

Using libkvm

The facilities presented in the previous example program are the low-level mechanisms for accessing data within the running kernel. This same code can be used on other Unix systems to retrieve data from the running kernel, although the kernel variables will obviously be different. The kvm library removes some of the system knowledge from the application program, so the coding of /dev/ksyms and /dev/kmem into the program are not necessary. The library also has all of the necessary character special files open and kept within the kvm “cookie” so that all necessary accesses can be accomplished by passing this cookie to the kvm functions.

The initial step is to call the open function, kvm_open. This function can be passed a number of null values for path names. When null values are passed, kvm_open will use default values that represent the currently running system. Remember that the kvm library can also be used to examine crash dumps, so you can specify a kmem image, such as a crash dump image, instead of the running system image in /dev/kmem. The prototype for the kvm_open function with parameter names and comments added is:

extern kvm_t *kvm_open(char *namelist, // e.g., /dev/ksyms 
char *corefile, // e.g., /dev/kmem
char *swapfile, // the current swap file
int flag, // O_RDONLY or O_RDWR
char *errstr); // error prefix string

The swapfile parameter is useful only when you are examining a running system and then only if the kvm_getu function is used to examine the u-area of a process.

An example of the use of kvm_open with the default values is shown in Figure 15-5.

Figure 15-5. Example Use of kvm_open with Default Values
kvm_t   *kd; 

/* open kvm */
kd = kvm_open(0, 0, 0, O_RDONLY, 0);
if (kd == 0) {
perror("kvm_open");
exit(1);
}

With this usage, the values for the running system become implementation dependent and are not included in the application program. The cookie returned by kvm_open is reused for all of the kvm library functions, and among the data it contains are file descriptors for the core file and swap file.

The kvm_nlist function looks the same as the nlist function with the exception that it uses the kvm cookie instead of a file descriptor. The return value is the same and the resulting nlist structure is the same. Figure 15-6 shows the call to kvm_nlist.

Figure 15-6. Call to kvm_nlist
/* use the kvm interface for nlist */ 
if (kvm_nlist(kd, nl) == -1) {
perror("kvm_nlist");
exit(1);
}

The final step is the use of kvm_kread. This example differs from the non- kvm example program in that it involves no explicit seek. Instead, the address of the kernel variable is included as a parameter to the kvm_kread call. kvm_kread is responsible for seeking to the correct offset and reading the data of the specified size. Figure 15-7 shows the call to kvm_kread, and Figure 15-8 shows the example program in its entirety.

Figure 15-7. Call to kvm_kread
/* a read from the kernel */ 
if (kvm_kread(kd, nl[0].n_value,
(char *) &maxusers, sizeof maxusers) == -1) {
perror("kvm_kread");
exit(1);
}

Figure 15-8. Simple Application That Uses the kvm Library
Code View: Scroll / Show All
/* compile with -lkvm -lelf */ 

#include
#include
#include
#include

main()
{
kvm_t*kd;
int maxusers;
static struct nlist nl[] = {
{ "maxusers" },
{ 0 }
};

/* open kvm */
kd = kvm_open(0, 0, 0, O_RDONLY, 0);
if (kd == 0) {
perror("kvm_open");
exit(1);
}
/* use the kvm interface for nlist */
if (kvm_nlist(kd, nl) == -1) {
perror("kvm_nlist");
exit(1);
}
/* a read from the kernel */
if (kvm_kread(kd, nl[0].n_value,
(char *) &maxusers, sizeof maxusers) == -1) {
perror("kvm_kread");
exit(1);
}
/* print and done */
printf("maxusers(0x%x) = %d\n", nl[0].n_value, maxusers);
kvm_close(kd);
exit(0);
}


As expected, the output of this program is the same as that of the previous one. Running truss on this program will show some extra work being done because the kvm library assumes that any of the kvm functions may be called and a search of the swap file may be required. Nonetheless, this example, although still desperately lacking in portability because of the low-level nature of the subject matter, is more portable across releases of Solaris since the details of interfacing to the operating system are left to the library.

Procs and U-Areas

The reason the kvm library opens the swap area is so user application images can be accessed as well as the kernel image. The kernel cannot be swapped out, but the user-level applications can. If an application does get swapped out, the kvm library can find the image in the swap area.

The second example program used the kvm_kread call to read the kernel memory. You can use the kvm_uread call to read user memory. In order for this command to work, make a call to the kvm_getu function so that the kvm library knows what u-area to read from. The kvm_getu function needs a struct proc * as an argument, so call the kvm_getproc function first. The prototypes for these functions are:

extern struct proc *kvm_getproc(kvm_t *kd, int pid); 
extern struct user *kvm_getu(kvm_t *kd, struct proc *procp);

The third example program uses a combination of calls from the first two programs as well as the kvm_getproc and kvm_getu functions. The example shows how to read the address space of a user application, using the kvm library.

Since the address space of a user application rather than the kernel will be read from, we cannot use the kvm_nlist function since it searches the name list of the kernel. The name list of the user application needs to be searched, so we use the nlist function directly. Figure 15-9 shows the third example.

Figure 15-9. Use of the kvm_getproc and kvm_getu Functions
Code View: Scroll / Show All
/* compile with -lkvm -lelf */ 

#include
#include
#include
#include
#include
#include

intglobal_value = 0xBEEFFACE;

main()
{
int local_value;
kvm_t*kd;
pid_tpid = getpid();
struct proc *proc;
struct user *u;
static struct nlist nl[] = {
{ "global_value" },
{ 0 }
};

/* open kvm */
kd = kvm_open(0, 0, 0, O_RDONLY, 0);
if (kd == 0) {
perror("kvm_open");
exit(1);
}
/* use nlist to retrieve symbols from the a.out binary */
if (nlist("a.out", nl) == -1) {
perror("nlist");
exit(1);
}
/* get the proc structure for this process id */
proc = kvm_getproc(kd, pid);
if (proc == 0) {
perror("kvm_getproc");
exit(1);
}
/* get the u-area */
u = kvm_getu(kd, proc);
if (u == 0) {
perror("kvm_getu");
exit(1);
}
/* read from the u-area */
if (kvm_uread(kd, nl[0].n_value,
(char *) &local_value, sizeof local_value) == -1) {
perror("kvm_uread");
exit(1);
}
/* print and done */
printf("global_value(0x%x) = 0x%X\n",
nl[0].n_value, local_value);
kvm_close(kd);
exit(0);
}


The output of the program is

global_value(0x8049bbc) = 0xBEEFFACE

For further verification of the value, we use the nm command to look up the value of global_value in the a.out file, yielding

[Index]   Value       Size   Type  Bind  Other Shndx Name 
[56] |0x08049bbc|0x00000004|OBJT |GLOB |0 |17 |global_value

which shows the value field as 0x08049bbc, the same as printed out by the example program.

Other Functions

The remaining functions in the kvm library concern the traversal and viewing of information about processes on the system. The process traversal functions are kvm_setproc and get_nextproc. kvm_getproc is also in this category but has been covered specifically by the third example program. Prototypes for these functions are:

extern int kvm_setproc(kvm_t *kd); 
extern struct proc *kvm_nextproc(kvm_t *kd);
extern int kvm_getcmd(kvm_t *kd, struct proc *procp, struct user *up, char
***argp, char ***envp);

The kvm_setproc function acts as a rewind function for setting the logical “pointer” to the beginning of the list of processes. The kvm_nextproc function simply retrieves the next process on this logical list.

The function kvm_getcmd retrieves the calling arguments and the environment strings for the process specified by the parameters to the function.

These functions can be used together to build an application that works much like the ps command. The fourth example in Figure 15-10 shows how to do this.

Figure 15-10. of kvm_nextproc and kvm_getcmd
Code View: Scroll / Show All
/* compile with -D_KMEMUSER -lkvm -lelf */ 

#include
#include
#include
#include
#include
#include
main()
{
char**argv = 0;
char**env = 0;
int i;
kvm_t*kd;
struct pid pid;
struct proc *proc;
struct user *u;

/* open kvm */
kd = kvm_open(0, 0, 0, O_RDONLY, 0);
if (kd == 0) {
perror("kvm_open");
exit(1);
}
/* "rewind" the get-process "pointer" */
if (kvm_setproc(kd) == -1) {
perror(“kvm_setproc”);
exit(1);
}
/* get the next proc structure */
while((proc = kvm_nextproc(kd)) != 0) {
/* kvm_kread the pid structure */
if (kvm_kread(kd, (unsigned long) proc->p_pidp,
(char *) &pid, sizeof pid) == -1) {
perror("kvm_kread");
exit(1);
}
/* reassign with the user space version */
proc->p_pidp = &pid;

/* get the u-area for this process */
u = kvm_getu(kd, proc);
if (u == 0) {
/* zombie */
if (proc->p_stat == SZOMB)
printf("%6d \n",
proc->p_pidp->pid_id);
continue;
}
/* read the command line info */
if (kvm_getcmd(kd, proc, u, &argv, &env) == -1) {
/* insufficient permission */
printf("%6d %s\n",
proc->p_pidp->pid_id, u->u_comm);
continue;
}
/* got what we need, now print it out */
printf("%6d ", proc->p_pidp->pid_id);
for(i=0; argv && argv[i]; i++)
printf("%s ", argv[i]);
/* need to free this value, malloc'ed by kvm_getcmd */
free(argv);
argv = 0;

/* now print the environment strings */
for(i=0; env && env[i]; i++)
printf("%s ", env[i]);
/* need to free this value, malloc'ed by kvm_getcmd */
free(env);
env = 0;
putchar('\n');
}
/* done */
kvm_close(kd);
exit(0);
}


Caveats

Extracting data from a running kernel can present problems that may not be evident to the uninitiated. Here are some of the issues that may arise during use of the kvm library.

  • Permission

    The files that need to be opened within the kvm library have permissions that permit only specific users to access them. If the kvm_open call fails with “permission denied,” you’ve run into this situation.

  • 64-bit pointers in future releases

    If the kernel is using 64-bit pointers, so must a user of the kvm library. See “64-bit Addressing” on page 138.

  • Concurrency

    Since the SunOS 5.x kernel is a multithreaded operating system, data structures within the kernel have locks associated with them to prevent the modification of an area of memory in parallel by two or more threads. This locking is in contrast to the monolithic operating system architecture that used elevated or masked processor interrupt levels to protect critical regions.

    This issue illuminates a shortcoming of accessing kernel memory from the user space as demonstrated by these examples. There is no mechanism whereby a user-level application can lock a data structure in the kernel so that it is not changed while being accessed and thereby cause the read to yield inaccurate data. This is, of course, a necessary precaution since a user-level application could lock a kernel data structure indefinitely and cause deadlock or starvation.

  • Pointers within data

    A common error in reading kernel data with kvm is to assume when reading structures with kvm_kread that all associated data has been read with it. If the structure read with kvm_kread contains a pointer, the data that it points at is not read. It must be read with another call to kvm_kread. This problem will manifest itself quickly with a segmentation violation. Figure 15-11 shows how to read the platform name from the root node of the devinfo tree.

Figure 15-11. Pointers Within Pointers
Code View: Scroll / Show All
/* compile with -lkvm -lelf */ 

#include
#include
#include
#include
#include
#include
#include
#include

main()
{
char name_buf[BUFSIZ];
kvm_t *kd;
caddr_t top_devinfo;
struct dev_inforoot_node;
static struct nlist nl[] = {
{ "top_devinfo" },
{ 0 }
};
/* open kvm */
kd = kvm_open(0, 0, 0, O_RDONLY, 0);
if (kd == 0) {
perror("kvm_open");
exit(1);
}
/* get the address of top_devinfo */
if (kvm_nlist(kd, nl) == -1) {
perror("kvm_nlist");
exit(1);
}
/* read the top_devinfo value, which is an address */
if (kvm_kread(kd, nl[0].n_value,
(char *) &top_devinfo, sizeof top_devinfo) == -1) {
perror("kvm_kread: top_devinfo");
exit(1);
}
/* use this address to read the root node of the devinfo tree */
if (kvm_kread(kd, (unsigned long) top_devinfo,
(char *) &root_node, sizeof root_node) == -1) {
perror("kvm_kread: root_node");
exit(1);
}
/* the devinfo structure contains a pointer to a string */
if (kvm_kread(kd, (unsigned long) root_node.devi_binding_name,
name_buf, sizeof name_buf) == -1) {
perror("kvm_kread: devi_binding_name");
exit(1);
}
/* finally! */
puts(name_buf);
kvm_close(kd);
exit(0);
}


Summary

We have laid out all the necessary tools for accessing kernel data. The remaining piece is knowing what data is useful for extraction. This information changes between different Unix systems and, for many cases, between releases of Solaris. The header files in /usr/include/sys define many of the data structures and variables in the kernel, but it is generally a bad idea to use kvm nowadays. There is a far better alternative, which we describe next.



Summary

Wehave laid out all the necessary tools for accessing kernel data. Theremaining piece is knowing what data is useful for extraction. Thisinformation changes between different Unix systems and, for many cases,between releases of Solaris. The header files in/usr/include/sys define many of the data structures and variables in the kernel, but it is generally a bad idea to usekvm nowadays. There is a far better alternative, which we describe next.

The Solaris 2 “kstat” Interface

Thekstatlibrary is a collection of functions for accessing data stored withinuser-level data structures that are copies of similar structures in thekernel. The nature of this data concerns the functioning of theoperating system, specifically in the areas of performance metrics,device configuration, and capacity measurement. The name kstat means “kernel statistics” to denote this role.

The collection of structures, both in user space and in the kernel, is referred to as thekstat chain. This chain is a linked list of structures. The user chain is accessible through thekstat library, and the kernel chain is accessible through the/dev/kstat character special file. The sum of the components is referred to as thekstat framework.

Thekstat framework, shown in Figure 15-12,consists of the kernel chain, a loadable device driver that acts as theliaison between the kernel chain and the user chain, the user chain,and thekstat library, which acts as the liaison between the user chain and the application program.

Figure 15-12. Thekstat Framework


The library and driver within the kernel are tightly coupled. When thekstat library sends anioctl request to the driver to readkstatdata, the driver can lock the appropriate data structures to ensureconsistency of the data being read. This point is important since thereading of data that is being written simultaneously by another kernelthread could result in incorrect data being transmitted to the userspace.

Link Structure

Each link in thekstat chain represents a functional component such as inode cache statistics. A link is identified by three values:

  • Module — A functional area such as a class or category.

  • Instance number — A numerical occurrence of this module.

  • Name — The text name for the link; an example of the names of two Lance Ethernet devices on a machine is:

    • le.0.le0

    • le.1.le1

Thedots connecting the parts are for notational convenience and do nothave any relation to the full name. In this case, the category isle, the instance numbers are 0 and 1, and the names are the concatenation of the module and instance,le0 andle1. The juxtaposition of the module and instance to form the name is neither uniform nor mandatory.

kstat Data Types

Eachlink in the chain has a pointer to the data for that link. The pointerdoes not point to one uniform structure. A type field within the linkdescribes what the pointer is pointing at. The types and the type ofdata being pointed to are one of five different types:

  • KSTAT_TYPE_RAW — Points to a C structure that is cast to the appropriate structure pointer type and indirected as a structure pointer.

  • KSTAT_TYPE_NAMED — Points to an array of structures that contain a name and polymorphic value represented by a union and type flag. The supported types for the union in releases prior to Solaris 2.6 are:

KSTAT_DATA_CHAR 1-byte integer signed KSTAT_DATA_LONG 4-byte integer signed KSTAT_DATA_ULONG 4-byte integer unsigned KSTAT_DATA_LONGLONG 8-byte integer signed KSTAT_DATA_ULONGLONG 8-byte integer unsigned KSTAT_DATA_FLOAT 4-byte floating point KSTAT_DATA_DOUBLE 8-byte floating point

The types in Solaris 2.6 and later releases are:

KSTAT_DATA_CHAR 1-byte integer signed KSTAT_DATA_INT32 4-byte integer signed KSTAT_DATA_UINT32 4-byte integer unsigned KSTAT_DATA_INT64 8-byte integer signed KSTAT_DATA_UINT64 8-byte integer unsigned

TheLONG,ULONG,LONGLONG, andULONGLONG names are maintained for portability but are obsolete. Since theFLOAT andDOUBLEtypes were never used, they have been deleted. These new values areexplicitly portable between 32-bit and 64-bit kernel implementations.

  • KSTAT_TYPE_INTR — Points to a structure containing information pertaining to interrupts.

  • KSTAT_TYPE_IO — Points to a structure containing information pertaining to I/O devices, disks, and, in Solaris 2.6, disk partitions, tape drives, and NFS client mount points. Also, with Solstice DiskSuite version 4.1 and later, metadevices will appear here.

  • KSTAT_TYPE_TIMER — The representation of this type is the same as that for KSTAT_TYPE_NAMED.

A visual representation of the aforementionedle devices is shown in Figure 15-13.

Figure 15-13.kstat Chain Structure Example


Thekstat Library

The programming model for accessingkstat data is:

  • Open

  • Traverse the kstat chain, reading any links of interest

  • Close

Manual pages are available for thekstat library functions, starting with SunOS 5.4. The minimal set ofkstat functions necessary for accessingkstat data from the chain are as follows.

The initial call to initialize the library, open the/dev/kstat device, and build the user chain:

extern kstat_ctl_t *kstat_open(void);

To read a link in the chain:.

 extern kid_t kstat_read(kstat_ctl_t *kc, kstat_t *ksp, void *buf);

Reciprocal ofkstat_open: to terminatekstat functions, release heap memory, and close the/dev/kstat device:

extern int kstat_close(kstat_ctl_t *kc);

Additional functions for accessingkstat data are the lookup functions.

To look up a specific link in the chain:

Code View:Scroll/Show All
extern kstat_t *kstat_lookup(kstat_ctl_t *kc, char *ks_module, int ks_instance, 
char *ks_name);


To look up a symbolic name within aKSTAT_TYPE_NAMED link.

extern void *kstat_data_lookup(kstat_t *ksp, char *name);

To synchronize the user chain with the kernel chain:.

extern kid_t kstat_chain_update(kstat_ctl_t *kc);

To write thekstat link back to the kernel, if possible:

extern kid_t kstat_write(kstat_ctl_t *kc, kstat_t *kp, void *buf);

You can use thekstat_lookupfunction to look up links in the chain by their explicit name. Use thisfunction when access to a unique link is required. If the link beingsought wasufs.0.inode_cache, thisfunction would be convenient to use instead of writing the code totraverse the chain. In the case of a link representing a resource withmultiple instances, such as network interfaces or disks, it isnecessary to traverse the chain manually, searching for the module nameof interest.

Use the functionkstat_data_lookup to look up the name of a member of aKSTAT_TYPE_NAMED structure. TheKSTAT_TYPE_NAMED links have akstat_datamember that points to an array of structures that represent the membersof a structure. This function traverses the array, finds the structurewith that name, and returns a pointer to it.

A minimal program for traversing thekstat chain is shown in Figure 15-14.

Figure 15-14. A Simple Program for Traversing thekstat Chain
Code View: Scroll / Show All
/* compile with cc ex1.c -lkstat */ 

#include
#include
#include

main()
{
kstat_ctl_t*kc;
kstat_t *kp;
static char*type_names[] = { /* map ints to strings */
"KSTAT_TYPE_RAW",
"KSTAT_TYPE_NAMED",
"KSTAT_TYPE_INTR",
"KSTAT_TYPE_IO",
"KSTAT_TYPE_TIMER"
};

/* initialize the kstat interface */
kc = kstat_open();
if (kc == 0) {
perror("kstat_open");
exit(1);
}
/* traverse the chain */
for (kp = kc->kc_chain; kp; kp = kp->ks_next)
printf("%-16.16s %s.%d.%s\n", type_names[kp->ks_type],
kp->ks_module, kp->ks_instance, kp->ks_name);
/* done */
kstat_close(kc);
return 0;
}


Reading thekstat Chain

By itself,kstat_openwill construct the entire chain without reading the data for theindividual links. If you need the data associated with a link, usekstat_read to read that data. Figure 15-17 useskstat_lookup,kstat_read, andkstat_data_lookup to find the inode cache data and display the cache hit rate. Figure 15-18 shows the use ofkstat_read andkstat_data_lookup to display the statistics of thele interfaces on the system.

Figure 15-15. Use ofkstat_lookup and kstat_data_lookup
Code View: Scroll / Show All
/* compile with cc ex2.c -lkstat */ 

#include
#include
#include

static ulong_t *get_named_member(kstat_t *, char *);

main()
{
kstat_ctl_t*kc;
kstat_t *kp;
ulong_t *hits;
ulong_t *misses;
ulong_t total;

/* initialize the kstat interface */
kc = kstat_open();
if (kc == 0) {
perror("kstat_open");
exit(1);
}
/* find the inode cache link */
kp = kstat_lookup(kc, "ufs", 0, "inode_cache");
if (kp == 0) {
fputs("Cannot find ufs.0.inode_cache\n", stderr);
exit(1);
}
/* read the ks_data */
if (kstat_read(kc, kp, 0) == -1) {
perror("kstat_read");
exit(1);
}
/* get pointers to the hits and misses values */
hits = get_named_member(kp, "hits");
misses = get_named_member(kp, "misses");
total = *hits + *misses;

/* print the hit rate percentage */
printf("inode cache hit rate: %5.2f %%\n",
((double) *hits * 100.0) / (double) total);
/* done */
kstat_close(kc);
return 0;
}


/* return a pointer to the value inside the kstat_named_t structure */
static ulong_t *
get_named_member(kstat_t *kp, char *name)
{
kstat_named_t*kn;

kn = (kstat_named_t * ) kstat_data_lookup(kp, name);
if (kn == 0) {
fprintf(stderr, "Cannot find member: %s\n", name);
exit(1);
}
return &kn->value.ul;
}


Figure 15-16. Traversing the Chain, Looking for “le” Devices
Code View: Scroll / Show All
/* compile with cc ex3.c -lkstat */ 

#include
#include
#include
#include

static ulong_t *get_named_member(kstat_t *, char *);

main()
{
ulong_t *ipackets;
ulong_t *opackets;
ulong_t *ierrors;
ulong_t *oerrors;
ulong_t *collisions;
double collision_rate;
kstat_ctl_t*kc;
kstat_t *kp;

/* initialize the kstat interface */
kc = kstat_open();
if (kc == 0) {
perror("kstat_open");
exit(1);
}
/* print the header */
printf("%-8.8s %-10.10s %-10.10s %-10.10s %-10.10s %-10.10s %s\n",
"Name", "Ipkts", "Ierrs",
"Opkts", "Oerrs", "Collis", "Collis-Rate");

/* traverse the chain looking for "module" name "le" */
for (kp = kc->kc_chain; kp; kp = kp->ks_next) {
/* only interested in named types */
if (kp->ks_type != KSTAT_TYPE_NAMED)
continue;
/* only interested in "le" module names */
if (strcmp(kp->ks_module, "le") != 0)
continue;
/* read ks_data */
if (kstat_read(kc, kp, 0) == -1) {
perror("kstat_read");
exit(1);
}
/* get pointers to members of interest */
ipackets = get_named_member(kp, “ipackets”);
opackets = get_named_member(kp, “opackets”);
ierrors = get_named_member(kp, “ierrors”);
oerrors = get_named_member(kp, “oerrors”);
collisions = get_named_member(kp, “collisions”);

/* compute and print */
if (*opackets)
collision_rate = (*collisions * 100) / *opackets;
else
collision_rate = 0.0;
printf("%-8.8s %-10d %-10d %-10d %-10d %-10d %-5.2f %%\n",
kp->ks_name, *ipackets, *ierrors,
*opackets, *oerrors, *collisions, collision_rate);
}

/* done */
kstat_close(kc);
return 0;
}

/* return a pointer to the value inside the kstat_named_t structure */
static ulong_t *
get_named_member(kstat_t *kp, char *name)
{
kstat_named_t*kn;

kn = (kstat_named_t *) kstat_data_lookup(kp, name);
if (kn == 0) {
fprintf(stderr, "Cannot find member: %s\n", name);
exit(1);
}
return & kn->value.ul;
}


Figure 15-17. Use ofkstat_lookup and kstat_data_lookup
Code View: Scroll / Show All
/* compile with cc ex2.c -lkstat */ 

#include
#include
#include

static ulong_t *get_named_member(kstat_t *, char *);

main()
{
kstat_ctl_t*kc;
kstat_t *kp;
ulong_t *hits;
ulong_t *misses;
ulong_t total;

/* initialize the kstat interface */
kc = kstat_open();
if (kc == 0) {
perror("kstat_open");
exit(1);
}
/* find the inode cache link */
kp = kstat_lookup(kc, "ufs", 0, "inode_cache");
if (kp == 0) {
fputs("Cannot find ufs.0.inode_cache\n", stderr);
exit(1);
}
/* read the ks_data */
if (kstat_read(kc, kp, 0) == -1) {
perror("kstat_read");
exit(1);
}
/* get pointers to the hits and misses values */
hits = get_named_member(kp, "hits");
misses = get_named_member(kp, "misses");
total = *hits + *misses;

/* print the hit rate percentage */
printf("inode cache hit rate: %5.2f %%\n",
((double) *hits * 100.0) / (double) total);

/* done */
kstat_close(kc);
return 0;
}


/* return a pointer to the value inside the kstat_named_t structure */
static ulong_t *
get_named_member(kstat_t *kp, char *name)
{
kstat_named_t*kn;

kn = (kstat_named_t * ) kstat_data_lookup(kp, name);
if (kn == 0) {
fprintf(stderr, "Cannot find member: %s\n", name);
exit(1);
}
return &kn->value.ul;
}


Figure 15-18. Traversing the Chain, Looking for “le” Devices
Code View: Scroll / Show All
/* compile with cc ex3.c -lkstat */ 

#include
#include
#include
#include

static ulong_t *get_named_member(kstat_t *, char *);

main()
{
ulong_t *ipackets;
ulong_t *opackets;
ulong_t *ierrors;
ulong_t *oerrors;
ulong_t *collisions;
double collision_rate;
kstat_ctl_t*kc;
kstat_t *kp;

/* initialize the kstat interface */
kc = kstat_open();
if (kc == 0) {
perror("kstat_open");
exit(1);
}
/* print the header */
printf("%-8.8s %-10.10s %-10.10s %-10.10s %-10.10s %-10.10s %s\n",
"Name", "Ipkts", "Ierrs",
"Opkts", "Oerrs", "Collis", "Collis-Rate");

/* traverse the chain looking for "module" name "le" */
for (kp = kc->kc_chain; kp; kp = kp->ks_next) {
/* only interested in named types */
if (kp->ks_type != KSTAT_TYPE_NAMED)
continue;
/* only interested in "le" module names */
if (strcmp(kp->ks_module, "le") != 0)
continue;
/* read ks_data */
if (kstat_read(kc, kp, 0) == -1) {
perror("kstat_read");
exit(1);
}
/* get pointers to members of interest */
ipackets = get_named_member(kp, "ipackets");
opackets = get_named_member(kp, "opackets");
ierrors = get_named_member(kp, "ierrors");
oerrors = get_named_member(kp, "oerrors");
collisions = get_named_member(kp, "collisions");

/* compute and print */
if (*opackets)
collision_rate = (*collisions * 100) / *opackets;
else
collision_rate = 0.0;
printf("%-8.8s %-10d %-10d %-10d %-10d %-10d %-5.2f %%\n",
kp->ks_name, *ipackets, *ierrors,
*opackets, *oerrors, *collisions, collision_rate);
}

/* done */
kstat_close(kc);
return 0;
}

/* return a pointer to the value inside the kstat_named_t structure */
static ulong_t *
get_named_member(kstat_t *kp, char *name)
{
kstat_named_t*kn;

kn = (kstat_named_t *) kstat_data_lookup(kp, name);
if (kn == 0) {
fprintf(stderr, "Cannot find member: %s\n", name);
exit(1);
}
return & kn->value.ul;
}


The above examples show how to obtain a link in thekstat chain, read it, look up values, and display information regarding thekstat data. But this sequence of operations is done only once. In the case of programs such asvmstat,iostat, andmpstat, the output is continually displayed on a regular interval. When this regular display is done, the functionkstat_chain_updatemust be called before every access to ensure that the user chainmatches the kernel chain. Since these examples use pointers to valuesinside of a structure of typekstat_named_t,it is possible that the pointer references a value inside a structurethat is no longer valid since the user chain may have changed inresponse to the modified kernel chain. When this situation happens, thepointers must be reinitialized to reflect the new user chain. Figure 15-19 demonstrates the use ofkstat_chain_update.

Figure 15-19. Use ofkstat_chain_update
Code View: Scroll / Show All
/* compile with -lkstat */ 

#include
#include
#include
#include
#define MINUTES 60
#define HOURS (60 * MINUTES)
#define DAYS (24 * HOURS)

static kstat_t *build_kstats(kstat_ctl_t *);
static ulong_t *get_named_member(kstat_t *, char *);

main()
{
int i;
ulong_t *clk_intr;
ulong_t hz = sysconf(_SC_CLK_TCK);
ulong_t days;
ulong_t hours;
ulong_t minutes;
ulong_t seconds;
kstat_ctl_t*kc;
kstat_t *kp;

/* initialize the kstat interface */
kc = kstat_open();
if (kc == 0) {
perror("kstat_open");
exit(1);
}

/* get the link and read it in */
kp = build_kstats(kc);

/* get a pointer to the clk_intr member */
clk_intr = get_named_member(kp, "clk_intr");

/* do forever */
for (;;) {
/* loop until kstat_chain_update returns zero */
for (; (i = kstat_chain_update(kc)) != 0; ) {
switch (i) {
case -1:
perror("kstat_chain_update");
exit(1);
default:
/* state change, rebuild and reread */
puts("<<<<<< STATE CHANGE >>>>>>");
kp = build_kstats(kc);
clk_intr = get_named_member(kp, "clk_intr");
break;
}
}
/* compute and print */
seconds = *clk_intr / hz;
days = seconds / DAYS;
seconds -= (days * DAYS);
hours = seconds / HOURS;
seconds -= (hours * HOURS);
minutes = seconds / MINUTES;
seconds -= (minutes * MINUTES);
printf(
"System up for %4d days %2d hours %2d minutes %2d seconds\r",
days, hours, minutes, seconds);
fflush(stdout);

/* pause a second */
sleep(1);

/* update the link */
if (kstat_read(kc, kp, 0) == -1) {
perror("kstat_read");
exit(1);
}
}
}

/* look up the link and read ks_data */
static kstat_t *
build_kstats(kstat_ctl_t *kc)
{
kstat_t*kp;

kp = kstat_lookup(kc, "unix", 0, "system_misc");
if (kp == 0) {
fputs("Cannot find unix.0.system_misc\n", stderr);
exit(1);
}
if (kstat_read(kc, kp, 0) == -1) {
perror("kstat_read");
exit(1);
}
return kp;
}
/* return a pointer to the value inside the kstat_named_t structure */
static ulong_t *
get_named_member(kstat_t *kp, char *name)
{
kstat_named_t*kn;

kn = (kstat_named_t *) kstat_data_lookup(kp, name);
if (kn == 0) {
fprintf(stderr, "Cannot find member: %s\n", name);
exit(1);
}
return &kn->value.ul;
}


Figure 15-19 continually displays how long the computer has been up by retrieving theclk_intr (clock interrupts) member from theunix.0.system_misc link. This code also uses thesysconf function to retrieve the value ofhz, which is the number of times per second a clock interrupt is received by the operating system. By dividingclk_intr byhz, you can compute the number of seconds the system has been up.

Writing thekstat Chain

Sincethe user chain is a copy of the kernel chain, you can modify a link andwrite it back to the kernel. However, some factors determine whetherthe data will actually be written. The first determining factor iswhether the writing process has the correct permissions Only thesuperuser can write data back to the kernel. The second factor is thatthe individual subsystem in the kernel holding thekstatdata within that link must determine whether the data will actually becopied back to kernel space. Many of the subsystems that contributedata to thekstat chain do not allow data to be written back. One subsystem that does is NFS. Thenfsstat command has a-z option that specifies reinitialization of the counters within NFS and RPC by means of thekstat_write command.

Caveats

Thekstat library is not very complex at the user level. Here are some minor issues that you should know about.

  • Permission

    The only kstat call that requires any level of permissions is kstat_write. Modification of kernel structures should not be taken lightly, and therefore, root permission is required. An advantage to kstat_write over similar methods using the kvm library is that the data is bounded. The subsystem holding the kstat values being written will receive the values to be written and put them away in an orderly fashion. kvm_kwrite, on the other hand, will allow arbitrary writes to any valid memory address.

  • KSTAT_TYPE_RAW data structures

    The KSTAT_TYPE_RAW data type provides a pointer to a data structure in its binary form. There is an indication of the size of the data structure, but resolving what data structure is actually pointed to is up to the programmer.

  • Chain updates

    Calling kstat_chain_update to synchronize the user and kernel chains can result in an indication to the application that the chain has changed. Be careful to discontinue using any prestored pointers into the old chain. If a referencing mechanism has been built around the chain, then if the call to kstat_chain_update indicates a change, the old referencing structure must be torn down and a new one built using the new chain.

Summary

Thekstatlibrary provides a lightweight and uniform method for accessingperformance data in the Solaris 2, SunOS 5.x kernel. It is specific toSunOS but is provided on both platform-specific releases of SunOS:SPARC and Intel.

Network Protocol (MIB) Statistics via Streams

MIBdata is information about a device on a network. MIB stands formanagement information base. This information provides feedbackregarding the operation of a device, primarily throughput informationbut also more detailed information regarding the overall state of adevice. It is the MIB data that is delivered through SNMP (SimpleNetwork Management Protocol).

Thepurpose of this section is not to articulate how to read SNMP anddisseminate it but to describe how the MIB data is retrieved by aprocess in a streams-based environment. Specifically, the Solarisimplementation is covered, although in theory, this code should work onany streams-based Unix system. In effect, this is how an SNMP daemonfor Solaris obtains its data.

The Streams Mechanism

Starting with Unix System 5, Release 3 (SVR3), the operating system contains a streams mechanism. This mechanism is a capability within the operating system whereby modulesrepresenting functional components are associated with each otherwithin a stack data structure. Modules are pushed onto this stack, andwhen the stack of modules is built, data is passed to the module on thetop of the stack; that module passes the data to the module under it.In turn, that module passes the data to the module under it, and soforth until the data reaches the first module that was pushed onto thestack. This stack of modules within the kernel is called a stream.

When data is sent from an application into the kernel to travel through a stream, the data is flowing downstream. Data flowing from the kernel back to user space is traveling upstream.

Sincenetwork protocols are also stack based, the streams model is anappropriate implementation vehicle. Data flows through each module ofthe network stack until it reaches either the device driver or theapplication.

As an example, Figure 15-20 shows the layering of the RSTATD protocol.

Figure 15-20. The RSTATD Protocol Stack


Inthis example, the first module pushed onto the stream would be the IPmodule, followed by the UPD module. When data is written onto thisstream by therpc.rstatdapplication, it will first go through the internals of the RPC library,which exists in user space. Then, it will enter kernel space to flowthrough the UDP module and then through the IP module. The IP modulewill hand off the data to the device driver for the Network InterfaceCard (NIC). This procedure is how the packet is built and encapsulatedas it travels downstream. Figure 15-21 demonstrates the creation of a stream and the pushing and popping of several modules onto the stream.

Figure 15-21. Pushing, Popping, and Identifying Streams Modules
Code View: Scroll / Show All
#include  
#include
#include
#define DEV_TCP "/dev/tcp" /* use /dev/tcp, /dev/ip is 660 mode */
#define ARP_MODULE "arp"
#define TCP_MODULE "tcp"
#define UDP_MODULE "udp"

static void
fatal(char *f, char *p)
{
charbuf[BUFSIZ];

sprintf(buf, f, p);
perror(buf);
exit(1);
}

main()
{
charbuf[BUFSIZ];
int mib_sd;

mib_sd = open(DEV_TCP, O_RDWR);
if (mib_sd == -1)
fatal("cannot open %s", DEV_TCP);

/* empty the stream */
while (ioctl(mib_sd, I_POP, 0) != -1) ;

/* load up the stream with all these modules */
if (ioctl(mib_sd, I_PUSH, ARP_MODULE) == -1)
fatal("cannot push %s module", ARP_MODULE);
if (ioctl(mib_sd, I_PUSH, TCP_MODULE) == -1)
fatal("cannot push %s module", TCP_MODULE);
if (ioctl(mib_sd, I_PUSH, UDP_MODULE) == -1)
fatal("cannot push %s module", UDP_MODULE);

/* now unload them and print them out */
while (ioctl(mib_sd, I_LOOK, buf) != -1) {
puts(buf);
if (ioctl(mib_sd, I_POP, 0) == -1)
fatal("ioctl(%s)", "I_POP");
}
close(mib_sd);
exit(0);
}


When this program is run, the output is:

% pushpop 
udp
tcp
arp

This order is the reverse of how the modules were pushed, showing the last-in, first-out (LIFO) nature of the stream structure.

Option Management Requests

Inaddition to data that is to be written out to the network, controlmessages can be sent downstream to alter the way the modules in thestream do their job or to retrieve information from the modules.Control messages are general-purpose structures for sending arbitrarydata to streams modules. A process sends a control message, and anacknowledgment is sent back to the process from the modules.

Oneof the control messages that can be sent downstream to the modules isan option management request. This type of control message is aninformational packet to be delivered to specific modules in the stream.In the case of retrieving MIB data from the streams module, only a “getall” type of message is supported. Therefore, the control message for a“get” of MIB data is sent to all modules in the stream regardless ofthe type of MIB data requested.

Wheneach module in the stream receives the MIB get request, it constructs areply in the respective structure defined above for each protocol andsends the data back in an acknowledgment message. Hence, when a MIB getoperation is performed, one write to the stream is performed, but Nreads are performed, where N is the number of modules in the streamthat can respond to this type of request. In the case of the NFS stack,there will be two responses, one from IP and the other from TCP or UDP.Usually, a stack is constructed for the single purpose of extractingMIB data from the modules that are maintaining a MIB structure.

MIB Structures in Solaris

The include file/usr/include/inet/mib2.hdefines many structures that define the MIB in Solaris. Each structureis specific to a particular protocol and, in the streams sense, astreams module. The MIB structures currently defined are as follows.

For IP:

mib2_ip Throughput and operation mib2_ipAddrEntry Addressing information mib2_ipRouteEntry Routing table entries mib2_ipNetToMediaEntry Logical to physical address mappings

For ICMP:

mib2_icmp Throughput and operation

For TCP:

mib2_tcp Throughput and operation mib2_tcpConnEntry Table of TCP connections

For UDP:

mib2_udp Throughput and operation mib2_udpEntry UDP endpoints in the "listen" state

Forperformance analysis, the structures defined for “throughput andoperation” are of the most interest. The remaining structures arebookkeeping structures and are of use in some cases for performanceanalysis, but in most cases, not. Figure 15-22 demonstrates the retrieval of themib2_ip,mib2_icmp,mib2_tcp, andmib2_udp structures.

Figure 15-22. Extracting MIB Structures from Streams Modules
Code View: Scroll / Show All
#include  
#include
#include
#include
#include
#include
#include
#include

#include
#include
#include

#include
#include

#include
#include
#include

/* 260 refers to Solaris 2.6.0 */
#if SOLARIS_VERSION >= 260
# include
#endif

#define DEV_TCP "/dev/tcp"
#define ARP_MODULE "arp"
#define TCP_MODULE "tcp"
#define UDP_MODULE "udp"

static void fatal(char *format, ...);

void
get_mib_data(mib2_ip_t *ip_struct,
mib2_icmp_t *icmp_struct,
mib2_tcp_t *tcp_struct,
mib2_udp_t *udp_struct)
{
char *trash = 0;
char buf[BUFSIZ];
int trash_size = 0;
int mib_sd;
int flags;
int n;
void *p;
struct strbuf control;
struct strbuf data;
struct T_optmgmt_req *req_opt = (struct T_optmgmt_req *) buf;
struct T_optmgmt_ack *ack_opt = (struct T_optmgmt_ack *) buf;
struct T_error_ack *err_opt = (struct T_error_ack *) buf;
struct opthdr *req_hdr;

/* open the stream and set up the streams modules */
mib_sd = open(DEV_TCP, O_RDWR);
if (mib_sd == -1)
fatal("open of %s failed", DEV_TCP);

while (ioctl(mib_sd, I_POP, &n) != -1) ;
if (ioctl(mib_sd, I_PUSH, ARP_MODULE) == -1)
fatal("cannot push %s module", ARP_MODULE);
if (ioctl(mib_sd, I_PUSH, TCP_MODULE) == -1)
fatal("cannot push %s module", TCP_MODULE);
if (ioctl(mib_sd, I_PUSH, UDP_MODULE) == -1)
fatal("cannot push %s module", UDP_MODULE);

/* setup the request options */
req_opt->PRIM_type = T_OPTMGMT_REQ;
req_opt->OPT_offset = sizeof(struct T_optmgmt_req );
req_opt->OPT_length = sizeof(struct opthdr );
#if SOLARIS_VERSION >= 260
req_opt->MGMT_flags = T_CURRENT;
#else
req_opt->MGMT_flags = MI_T_CURRENT;
#endif

/* set up the request header */
req_hdr = (struct opthdr *) & req_opt[1];
req_hdr->level = MIB2_IP;
req_hdr->name = 0;
req_hdr->len = 0;

/* set up the control message */
control.buf = buf;
control.len = req_opt->OPT_length + req_opt->OPT_offset;

/* send the message downstream */
if (putmsg(mib_sd, &control, 0, 0) == -1)
fatal("cannot send control message");

/* set up for the getmsg */
req_hdr = (struct opthdr *) & ack_opt[1];
control.maxlen = sizeof buf;

for (;;) {
/* start reading the response */
flags = 0;
n = getmsg(mib_sd, &control, 0, &flags);
if (n == -1)
fatal("cannot read control message");

/* end of data? */
if ((n == 0) &&
(control.len >= sizeof(struct T_optmgmt_ack )) &&
(ack_opt->PRIM_type == T_OPTMGMT_ACK) &&
(ack_opt->MGMT_flags == T_SUCCESS) &&
(req_hdr->len == 0))
break;

/* if an error message was sent back */
if ((control.len >= sizeof(struct T_error_ack )) &&
err_opt->PRIM_type == T_ERROR_ACK)
fatal("error reading control message");
/* check for valid response */
if ((n != MOREDATA) ||
(control.len < sizeof(struct T_optmgmt_ack )) ||
(ack_opt->PRIM_type != T_OPTMGMT_ACK) ||
(ack_opt->MGMT_flags != T_SUCCESS))
fatal("invalid control message received");

/* cause the default case to happen */
if (req_hdr->name != 0)
req_hdr->level = -1;

switch (req_hdr->level) {
case MIB2_IP:
p = ip_struct;
break;
case MIB2_ICMP:
p = icmp_struct;
break;
case MIB2_TCP:
p = tcp_struct;
break;
case MIB2_UDP:
p = udp_struct;
break;
default:
if ((trash == 0) ||
(req_hdr->len > trash_size)) {
if (trash)
free(trash);
trash = (char *) malloc(req_hdr->len);
if (trash == 0)
fatal("out of memory");
trash_size = req_hdr->len;
}
p = trash;
break;
}

/* read the data from the stream */
data.maxlen = req_hdr->len;
data.buf = (char *) p;
data.len = 0;
flags = 0;

n = getmsg(mib_sd, 0, &data, &flags);
if (n != 0)
fatal("error reading data");
}

if (trash)
free(trash);
close(mib_sd);
}

static void
fatal(char *format, ...)
{
va_list args;

va_start(args, format);
vfprintf(stderr, format, args);
putc('\n', stderr);
exit(1);
}


main()
{
mib2_ip_t ip_struct;
mib2_icmp_t icmp_struct;
mib2_tcp_t tcp_struct;
mib2_udp_t udp_struct;

get_mib_data(&ip_struct, &icmp_struct,
&tcp_struct, &udp_struct);

/* carry on with your newly acquired MIB data */

puts ("udp_struct = {");
printf(" udpInDatagrams = %u\n", udp_struct.udpInDatagrams);
printf(" udpInErrors = %u\n", udp_struct.udpInErrors);
printf(" udpOutDatagrams = %u\n", udp_struct.udpOutDatagrams);
printf(" udpEntrySize = %u\n", udp_struct.udpEntrySize);
puts ("}");

return 0;
}


Datasent upstream as a result of the control message is sent in singleblocks representing the entire mib2 structure as defined in themib2.hinclude file. Individual values cannot be queried. Once the controlmessage is sent, the drivers in the stream will send the entirecorresponding structure.

Aninteresting note is that the mib2 structures contain not only the dataas defined by RFC 1213, “Management Information Base for NetworkManagement of TCP/IP-based internets: MIB-II,” but also other data thatis of interest to applications that query the state of the network codein the kernel. Many of the values contained in these mib2 structures,including the values not defined by RFC 1213, are shown in the outputof thenetstat -s command. The TCP section is described in “Introduction to TCP” on page 57. References to books that are relevant to this section include “Computer Networks” on page 566, “Internetworking with TCP/IP Volume II” on page 567, and “The Magic Garden Explained” on page 567.

The Network Device Driver Interface

TheNetwork Device Driver (NDD) interface gets and sets tunable parametersthat control drivers, specifically, the network stack drivers. Thesedrivers include the IP, TCP, UDP, ICMP, and ARP drivers, and networkinterfaces such ashme.

Inprevious versions of Unix, including SunOS, the variables that wereavailable for tuning were not clearly defined, and modifying them was acomplex task of installingadb commands in system boot files. With the NDD interface comes thenddcommand for interacting with the drivers and viewing and setting theavailable variables. This command must also be placed in system bootfiles for modifying the driver parameters upon boot but is more userfriendly thanadb.

Theprogramming interface for interacting with the drivers to access theNDD variables is not as complex as that for the MIB structures.Accessing the MIB structures required building a stream containing theappropriate streams modules. You can access the NDD variables byopening the character special device representing the network driver inquestion and using theioctl() system call to pass requests to the driver.

Unlike the MIB structures, theNDD variables can be written to as well as read. The sameioctl() interface is used to set variables as to read them, except that a different flag is used within theioctl() request structure.

Figure 15-23 demonstrates how to read allNDD variables from all the protocol drivers.

In this example, you can see that by querying the value of an NDD variable with the name of?,you get a response from the drivers that specifies the names of all ofthe variables available for that driver and whether the variable ismode read-only, read/write, or write-only.

NDDread requests differ from MIB in that MIB reads return the entire MIBstructure for the device in question. With NDD, only one variable at atime can be read or written unless the special value? has been supplied in a read request.

Asalways, be careful what you tune kernel variables to. Improper tuningcould lead to disastrous results or just very poor performance.

Figure 15-23. Dump All the Variables Available Through NDD
Code View:Scroll/Show All
#include  
#include
#include
#include
#include
#include
#include
#include

/*
* big enough to hold tcp_status */
* (hey it’s "virtual" memory, right? :-))
*/
static charndd_buf[65536];

typedef enum {
VAR_INT_T,
VAR_STRING_T

} var_type_t;

typedef struct {
char *var_name;
var_type_tvar_type;
union {
int var_int;
char*var_string;
} var_un;
} var_t;

typedef struct {
char*ndd_dev_name;
int ndd_sd;
} ndd_dev_t;

static int
ndd_name_io(ndd_dev_t *np, int cmd, var_t *vp)
{
char*p;
int i;
struct strioctl str_cmd;

/* open the device if not open already */
if (np->ndd_sd == 0) {
np->ndd_sd = open(np->ndd_dev_name, O_RDWR);
if (np->ndd_sd == -1) {
perror(np->ndd_dev_name);
return - 1;
}
}

/* clear the buffer */
memset(ndd_buf, '\0', sizeof ndd_buf);

/* set up the stream cmd */
str_cmd.ic_cmd = cmd;
str_cmd.ic_timout = 0;
str_cmd.ic_dp = ndd_buf;
str_cmd.ic_len = sizeof ndd_buf;

/* set up the buffer according to whether it’s a read or write */
switch (cmd) {
case ND_GET:
strcpy(ndd_buf, vp->var_name);
break;
case ND_SET:
switch (vp->var_type) {
case VAR_INT_T:
sprintf(ndd_buf, "%s%c%d", vp->var_name,
'\0', vp->var_un.var_int);
break;
case VAR_STRING_T:
sprintf(ndd_buf, "%s%c%s", vp->var_name,
'\0', vp->var_un.var_string);
break;
default:
/* ? */
return - 1;
}
break;
default:
/* ? */
return - 1;
}

/* retrieve the data via ioctl() */
if (ioctl(np->ndd_sd, I_STR, &str_cmd) == -1) {
perror("ioctl");
return - 1;
}

/* if it's a read, put it back into the var_t structure */
if (cmd == ND_GET) {
switch (vp->var_type) {
case VAR_INT_T:
vp->var_un.var_int = atoi(ndd_buf);
break;
case VAR_STRING_T:
for (i=0, p=ndd_buf; i if (*p == '\0')
*p = '\n';
if (vp->var_un.var_string)
free(vp->var_un.var_string);
vp->var_un.var_string = strdup(ndd_buf);
break;
default:
/* ? */
return - 1;
}
}
return 0;
}

main()
{
static ndd_dev_t dev_names[] = {
{ "/dev/ip", 0 },
{ "/dev/tcp", 0 },
{ "/dev/udp", 0 },
{ "/dev/arp", 0 },
{ "/dev/icmp", 0 },
{ 0, 0 } };
ndd_dev_t * np;
static var_t var = {
"tcp_status", VAR_STRING_T, 0
};
/* traverse all the devices and dump the variables' names */
for (np = dev_names; np->ndd_dev_name; np++) {
if (ndd_name_io(np, ND_GET, &var) != -1)
printf("For %s\n\n%s\n", np->ndd_dev_name,
var.var_un.var_string);
}
return 0;
}