[Milberg09] Chapter 13. Network I/O: Introduction

来源:百度文库 编辑:神马文学网 时间:2024/04/27 21:29:49

Chapter 13. Network I/O: Introduction

Thefirst thing that usually comes to mind when a system administratorhears that there might be some network contention issues is to run netstat. The netstat command — the "net" equivalent of using vmstat or iostat — provides a quick-and-dirty way to get an overview of how your network is configured. Unlike vmstat or iostat,however, the command defaults usually don't give you as muchinformation as you'd probably like. You need to understand the correctusage of netstat and how best to use it when monitoring your system.

The netstat facility isn't really a monitoring tool in the sense that vmstat and iostatare. Other, more suitable tools (which we'll get to later) areavailable to help you monitor your network subsystem. At the same time,you can't really start to monitor until you have a thoroughunderstanding of the various components related to network performance.These components include your network adapters, your switches androuters, and how you are using virtualization on your host logicalpartitions.

If youdetermine that you indeed are experiencing a network bottleneck, thesolution to the problem might actually lie outside your immediate hostmachine. If the network switch is improperly configured on the otherend, there is little you can do. Of course, you might be able to pointthe network team in the right direction. You should also spend timegathering overall information about your network.

Howare you going to be able to understand how to troubleshoot your networkdevices unless you really understand the network? In the next fewchapters, we'll look at specific AIX network tracing tools, such as netpmon, to see how they can help you isolate your bottlenecks.

Nomatter which subsystem you want to tune, remember that systems tuningis an ongoing process. As I've stated before, the best time to startmonitoring your systems is at the beginning, before you have anyproblems and when users aren't screaming. You need a baseline ofnetwork performance so that you know what the system looks like whenit's behaving normally. And remember: be careful to make changes one ata time so you can assess the actual impact of each change.

13.1. Network I/O Overview

Understandingthe network subsystem as it relates to AIX is not an easy undertaking.From a hardware and software aspect, there are far fewer areas you needto investigate when you examine CPU and memory bottlenecks. Tuning diskI/O is more complex than other tuning activities because many moreissues affect performance, particularly during the architecting andbuild-out of systems. In this respect, tuning the network is probablymost similar to tuning disk I/O — a fact that's actually not toosurprising, given that both relate to I/O.

Let's start by examining the AIX Transmission Control Protocol/Internet Protocol (TCP/IP) layers, which are depicted in Figure 13.1.

Figure 13.1. AIX TCP/IP layers

From this illustration, you can clearly see that there is more to network monitoring than simply running netstatand looking for collisions. From the application layer through themedia layer, areas need to be configured, monitored, and tuned. At thispoint, you should notice some similarities between this illustrationand the Open Systems Interconnection (OSI) model, which divides networkarchitecture into seven layers (from top to bottom):

  • Application

  • Presentation

  • Session

  • Transport

  • Network

  • Data link

  • Physical

Perhapsthe most important concept to understand is that each layer on the hostmachine communicates with the corresponding layer on the remotemachine. The actual application programs transmit data using either theUser Datagram Protocol (UDP) or the TCP transport layer protocols. Theyreceive the data from whatever application you are using and dividethat data into packets. The packets themselves differ depending onwhether a packet is a UDP packet or a TCP packet. In general, UDP isfaster, while TCP is more secure.

Thereare many tunable parameters to look at, and we'll get to these later.To begin, you might want to start to familiarize yourself with the nocommand, which is the utility designed to make most network changes.From a hardware perspective, it is critical for you to understand thecomponents that must be configured appropriately to optimizeperformance.

Althoughyou might work together with the network teams that manage yourswitches and routers, you probably won't be configuring those devicesunless you're a small shop or a one-person IT department. The mostimportant component you'll work with is the network adapter. Most ofyour adapters will probably be some version that supports GigabitEthernet, such as a 10/100/1000 Mbps Ethernet card. Let's review theimportant concepts you'll need to work with here.

13.2. NFS

Introducedby Sun Microsystems in 1984, the Network File System (NFS) lets clientsaccess files over a network as if the files were locally attached asdisks. Version 2 of NFS, introduced in 1989, operated exclusively onUDP. Version 3, which debuted in 1995, added TCP support, which helpedNFS thrive over a wide area network (WAN). Version 4, introduced in2000, was the first version developed by the Internet Engineering TaskForce (after Sun relinquished control of NFS development).NFS V4 wasalso the first version to provide stateful support, whereby both theclient and the server maintain current information about both openfiles and file locks.

NFSwas further enhanced in 2003 under RFC3530, and it is this standardthat AIX supports. AIX 5.3 supports three versions of NFS: Versions 2,3, and 4. The default version is Version 3. (For Red Hat Linux, thedefault NFS version is Version 4.) You can choose the NFS version typeduring the actual mounting of the file system, and you can rundifferent NFS versions on the same server.

NFSnow supports both TCP and UDP. Because UDP is faster (it does less),some environments that favor optimum performance (on a LAN) overreliability might perform better with UDP. TCP is more reliable(because it establishes connections) and provides better performanceover a WAN (because its flow control helps minimize network latency).

Abenefit of NFS is that it acts independently of machine types andoperating systems. It achieves this independence through the use ofremote procedure calls (RPCs), as depicted in Figure 13.2.

Figure 13.2. Interaction between client and server

Thefigure illustrates how NFS clients A and B access the data on NFSserver Z. The client computers first request access to the exporteddata by mounting the file system. Then, when a client thread tries toprocess data within the NFS mounted file system, the data is redirectedto the biod daemon, which takes the data through the LAN to the NFS server and its nfsd daemon. The server uses nfsdto export the directories that are available to its clients. As you cansee, you'll need to tune the network and I/O parameters. If Server Z isperforming poorly, that obviously affects all of its NFS clients. Ifpossible, tune the server specifically to function as an NFS server(more about this later).

What about the biod daemon? This daemon is required to perform both read-ahead and write-behind requests. biodimproves overall NFS performance as it either empties or fills up thebuffer cache, acting as a liaison to the client applications. As shownin the figure, the biod daemon sends the requests to the server. On the other side, nfsd is the liaison that provides NFS services to clients. When the server receives biod communications from the client, it uses the nfsd daemon until the request is completed.

How is it that NFS was not stateful until Version 4, even though it could use TCP as early as Version 2? Figure 13.3 illustrates where NFS resides in relation to the TCP/IP stack and the OSI model.

Figure 13.3. NFS relationship to OSI and TCP/IP

BecauseNFS uses remote procedure calls, it does not reside on the transportstack. RPCs are a library of procedures that enable the client andserver processes to execute system calls as if they were executed intheir own address spaces. In a typical UDP NFS Version 2 or 3implementation, the NFS server sends its client a type of cookie afterthe clients are authorized to share the volume. This approach helpsminimize network traffic. The problem is that if the server goes down,clients will continue to inundate the network with requests. That iswhy there is a preference for using TCP. Only Version 4 can usestateful connections, and only Version 4 uses TCP as its transportprotocol.

NFS 4 has no interaction with portmap or other daemons such as lockd and statd,because these functions are rolled into the kernel. In versions otherthan Version 4, the portmapper is used to register RPC services and toprovide the port numbers for the communications between clients andservers. External Data Representation (XDR) provides the mechanism thatRPC and NFS use to ensure reliable data exchange between client andserver. This interaction takes place in a way that isplatform-independent for the exchange of binary data, thus addressingthe possibility of systems representing data in different ways. UsingXDR, data can be interpreted correctly, even on platforms that are notalike.

13.3. Media Speed

Networkadapters communicate with other devices based on how the media speed isconfigured. Although other choices are available, you should configureyour card for either 100 Mbps full duplex or auto-negotiation. Withauto-negotiation, both adapters try to communicate using the highestpossible speed. The documentation might tell you that you need toconfigure the card this way (IBM even defaults to auto-negotiation onthe system), but most senior AIX administrators I know prefer to set itto full duplex to ensure they receive the fastest possible adapterspeed. If this setting doesn't function properly, you should work withthe appropriate network teams to resolve the problem before deployment.

Iprefer to take more time initially rather than set the adapter to anoption that might cause slower speeds as a result of poorly configuredswitches. The lsattr command gives you the information you need. Used with the en prefix, it displays your driver parameters; the ent prefix displays your hardware parameters. In the following case, the interface is set to auto-negotiate.

# lsattr -El ent0

alt_addr 0x000000000000 Alternate Ethernet Address True
busintr 166 Bus interrupt level False
busmem 0xc8030000 Bus memory address False

chksum_offload  yes              Enable RX Checksum Offload        True
intr_priority 3 Interrupt priority False
ipsec_offload no IPsec Offload True
large_send no Enable TCP Large Send Offload True
media_speed Auto_Negotiation Media Speed True
poll_link no Enable Link Polling True
poll_link_timer 500 Time interval for Link Polling True
rom_mem 0xc8000000 ROM memory address False
rx_hog 1000 RX Descriptors per RX Interrupt True
rxbuf_pool_sz 1024 Receive Buffer Pool Size True
rxdesc_que_sz 1024 RX Descriptor Queue Size True
slih_hog 10 Interrupt Events per Interrupt True
tx_preload 1520 TX Preload Value True
tx_que_sz 8192 Software TX Queue Size True
txdesc_que_sz 512 TX Descriptor Queue Size True
use_alt_addr no Enable Alternate Ethernet Address True

Youshould also check your adapter firmware levels to make sure they'reup-to-date. I've seen many network problems fixed by updating to thelatest levels of firmware. The lscfg command reports firmware information:

# lscfg -vp | grep -p ROM

10/100 Mbps Ethernet PCI Adapter II:
Part Number................. 09P5023
FRU Number.................. 09P5023
EC Level.................... H10971A
Manufacture ID.............. YL1021
Network Address............. 0002556FC98B
ROM Level.(alterable)....... SCU015
Product Specific.(Z0)...... A5204207
Device Specific.(YL)........U0.1-P1-I1/E1

10/100/1000 Base-TX PCI-X Adapter:
Part Number................. 00P3056
FRU Number.................. 00P3056
EC Level.................... H11635A
Manufacture ID.............. YL1021
Network Address............. 00096B2E31BD
ROM Level.(alterable)....... GOL002
Device Specific.(YL)........U0.1-P1/E2

13.4. Network Subsystem Memory Management

Youshould also start to familiarize yourself with the memory managementfacility of network subsystems. This facility makes use of datastructures called mbufsthat are used to store kernel data for incoming and outbound traffic.The buffer sizes themselves can range from 32 bytes to 16,384 bytes.The buffer pools are created by making allocation requests to theVirtual Memory Manager. In a symmetric multiprocessing box, each memorypool is split evenly for every processor. An important point to note isthat a processor cannot borrow from the memory pool outside of its ownprocessor.

13.5. Virtual and Shared Ethernet

Two other concepts to be familiar with are virtual Ethernet and shared Ethernet.

Firstsupported on AIX 5.3 on POWER5, virtual Ethernet allows forinterpartition- and IP-based communications between logical partitionson the same frame. This functionality is achieved through the use of avirtual I/O switch. The Ethernet adapters themselves are created andconfigured using the Hardware Management Console (HMC).

SharedEthernet is one of the features of Advanced Power Virtualization (APV)or PowerVM. It enables the use of virtual I/O servers (VIOs), wherebyseveral host machines can actually share one physical network adapter.Shared Ethernet is typically used in environments that don't requiresubstantial network bandwidth.

Althoughan in-depth discussion of virtualization is beyond the scope of thisbook, you should understand that if you are using virtualization, theremight be other reasons for your bottleneck outside of what you're doingon the host machine. Virtualization is a wonderful thing, but you needto be careful not to share too many adapters from your VIO server, oryou might pay a large network I/O penalty. Use of the appropriatemonitoring tools should inform you whether you have a problem. Further,you might want to familiarize yourself with concepts such as AddressResolution Protocol (ARP) and Domain Name Server (DNS), which can alsoaffect network performance and reliability in different ways.