[Horwitz02] Section 11.9-11.12 Planning for Memory and Swap Capacity etc.

来源:百度文库 编辑:神马文学网 时间:2024/04/27 23:03:37

Planning for Memory and Swap Capacity

Memory is not aneasy resource to plan for, as it can be used in so many different waysand affects so many other subsystems. The most important thing you cando to enhance memory performance is to buy more memory! RAM is a scarceresource on most systems, and avoiding degraded performance from pagingand swapping should be your number one goal. Following are someplanning techniques that you can use to determine future capacity needs.

Monitor Process Memory Usage

Usetop,ps, andpmap to monitormemory usage for the applications running on your systems. You shouldbecome familiar with each process's memory utilization habits andimmediately recognize a problem when a monitor reveals abnormal data.For example, 16oracle processes thateach consume 430MB of resident memory may seem excessive to theuninitiated administrator, but an administrator who monitors the serverevery day may dismiss this as completely normal memory utilization foranoracle process. This administrator will be able to tell there is a problem when usage shoots up to 1GB for a few of those processes.

Allocate More Swap Than You Need

When they add memoryto a system, many administrators neglect to adjust the swap spaceaccordingly, which can be a real problem for applications, such asOracle, that have swap requirements. The easiest way to accommodate newmemory is to allocate more swap than you need when configuring theoperating system. This “padding” leaves you some breathing room for memory upgrades.

One Swap Partition per Disk!

Another way to adjust swap to meet the needs of additional memory is to add swap partitions on other disks that have available space. However, you should never put more than one swap partition on the same disk, as heavy paging activity could send the disk heads into a frenzy writing to two physically separate areas of the disk.


Gauging Network Performance and Capacity

Fine-tuning CPU usage,memory, swap, and disk performance will do you no good if your serverscannot be reached over the network. System administrators often takeconnectivity and bandwidth for granted, assuming one machine will beable to talk to another in a timely fashion.

Systemadministrators must plan network capacity and optimize networkperformance, just as they do any other system resource. But the networkoffers special challenges, because it is a shared medium used by all ofthe systems connected to it. To optimize network performance, systemadministrators must learn to gauge specific performance metrics andunderstand what factors can affect them.

Bandwidth Versus Latency

Network performance isgoverned by two major factors: bandwidth and latency. In previoussections of this chapter, you learned that bandwidth is the maximumdata transfer rate of a connection, usually expressed in bits persecond. Latency is the delay between the time a packet is sent from itssource and the time it is received by the destination. Latency isindependent of the bandwidth of the connection. Many systemadministrators incorrectly gauge the performance of their network byits bandwidth alone, without considering latency.

pingcan be used as a “quick and dirty” method to test the latency of a connection.ping measures the round-trip time of the ICMP echo requests it sends to the destination hosts. In one case, usingpingto test the latency on two T1 lines in an office yielded somesurprising results. The T1 lines were from different providers; one wasa full 1.544Mb/s line while the other was a fractional T1 at 768Mb/s,half the bandwidth of the full T1. Latency was tested by pinging therouter on the other side of each T1 line once per second for tenseconds. The tests produced the following results (addresses changedfor privacy):

Code View:Scroll/Show All
# TEST FULL T1
bash$ ping -s 192.168.0.5 56 10
PING 192.168.0.5: 56 data bytes
64 bytes from 192.168.0.5: icmp_seq=0. time=6. ms
64 bytes from 192.168.0.5: icmp_seq=1. time=6. ms
64 bytes from 192.168.0.5: icmp_seq=2. time=6. ms
64 bytes from 192.168.0.5: icmp_seq=3. time=6. ms
64 bytes from 192.168.0.5: icmp_seq=4. time=6. ms
64 bytes from 192.168.0.5: icmp_seq=5. time=6. ms
64 bytes from 192.168.0.5: icmp_seq=6. time=6. ms
64 bytes from 192.168.0.5: icmp_seq=7. time=6. ms
64 bytes from 192.168.0.5: icmp_seq=8. time=6. ms
64 bytes from 192.168.0.5: icmp_seq=9. time=6. ms

----192.168.0.5 PING Statistics----
10 packets transmitted, 10 packets received, 0% packet loss
round-trip (ms) min/avg/max = 6/6/6

# TEST FRACTIONAL T1
bash$ ping -s 172.16.0.5 56 10
PING 172.16.0.5: 56 data bytes
64 bytes from 172.16.0.5: icmp_seq=0. time=4. ms
64 bytes from 172.16.0.5: icmp_seq=1. time=3. ms
64 bytes from 172.16.0.5: icmp_seq=2. time=4. ms
64 bytes from 172.16.0.5: icmp_seq=3. time=3. ms
64 bytes from 172.16.0.5: icmp_seq=4. time=3. ms
64 bytes from 172.16.0.5: icmp_seq=5. time=3. ms
64 bytes from 172.16.0.5: icmp_seq=6. time=3. ms
64 bytes from 172.16.0.5: icmp_seq=7. time=3. ms
64 bytes from 172.16.0.5: icmp_seq=8. time=3. ms
64 bytes from 172.16.0.5: icmp_seq=9. time=3. ms

----172.16.0.5 PING Statistics----
10 packets transmitted, 10 packets received, 0% packet loss
round-trip (ms) min/avg/max = 3/3/4



Linuxping

The Linux version of ping sends one ICMP echo request every second by default, and does not require the -s flag.


The full T1 had an average round-trip time of 6 secondswhile the fractional T1, with half the bandwidth of the full T1, had anaverage round-trip time of 3 seconds. The packets sent over thefractional T1 made the trip in half the time it took the packets totravel on the full T1. These results indicated that latency operatedindependently of bandwidth in this situation. This outcome is notalways the case, though; on very congested lines, bandwidth limitationsmay delay packets from reaching their destinations, increasing latency.

Latencyon the order of a few milliseconds is nothing to be concerned about,but when it begins to rise closer to 100 milliseconds, your networkcommunications will begin to experience significant delays andinterruptions. These problems are especially noticeable in interactiveapplications such as Telnet or SSH, where you expect a character toappear almost immediately after you press a key on your keyboard.

Modem Latency

The average cable modem or DSL broadband connection has latency on the order of tens of milliseconds, while traditional analog modems have latencies in the hundreds of milliseconds. This is one reason why Web surfing over a modem seems so slow.


Many factors contribute to latency, but the usual suspects are the following:

  • Distance

  • Physical medium (such as optical fiber versus copper)

  • Number of hops to the destination

  • Router load

  • Router queuing priorities

Counting Hops

One of the factors that contributes to latency is the number of hops between a packet's source and its destination. A hopis simply a router between the source host and the destination host.The destination host is included in the number of hops, so the numberof hops between two hosts on the same network is always one (assumingno bridging). Hops are the units of logical distance on a network, aseach hop introduces additional latency to a connection.

Every packet has a TTL (time to live),which is decremented at each hop. If the TTL reaches zero on a router,the router refuses to route the packet any further and returns an ICMPTime Exceeded control message back to the source host. This controlprevents infinite routing loops, which would quickly overload amisconfigured router.

Too many hops can add to latency, so it is often useful to count the number of hops to certain sites usingtraceroute.

bash$ /usr/sbin/traceroute -n 192.168.5.6
traceroute to 192.168.5.6 (192.168.5.6), 30 hops max, 40 byte packets
1 192.168.1.2 3.711 ms 3.331 ms 2.609 ms
2 192.168.10.5 5.693 ms 5.715 ms 5.670 ms
3 192.168.6.5 5.615 ms 5.534 ms 5.516 ms
4 192.168.5.6 5.975 ms 5.837 ms 5.746 ms

Therewere four hops from the source host to the destination 192.168.5.6. Thetimes reported to the right of each hop are round-trip times for eachof three “probes” thattraceroute sends to the destination.

Devices are not required to answer requests fromping ortraceroute. Most firewalls silently drop these packets, resulting in some skipped hops in the output oftraceroute.

Code View:Scroll/Show All
$ /usr/sbin/traceroute -n 207.8.173.37
traceroute to 207.8.173.37 (207.8.173.37), 30 hops max, 40 byte packets
1 192.168.1.2 3.030 ms 2.796 ms 2.864 ms
2 208.172.25.1 10.044 ms 9.878 ms 9.985 ms
3 208.172.18.62 10.673 ms 10.508 ms 10.487 ms
4 206.24.194.61 13.901 ms 14.281 ms 12.099 ms
5 206.24.195.226 12.928 ms 13.147 ms 11.980 ms
6 144.232.7.105 12.391 ms 12.928 ms 12.361 ms
7 144.232.9.226 63.675 ms 16.622 ms 198.113 ms
8 144.232.14.138 14.910 ms 16.768 ms 14.996 ms
9 144.232.14.42 15.904 ms 15.791 ms 16.542 ms
10 160.81.19.254 15.841 ms 16.488 ms 16.265 ms
11 207.106.31.34 17.621 ms 16.738 ms 16.431 ms
12 207.8.128.93 21.685 ms 20.174 ms 20.364 ms
13 * * *
14 * * *
15 * * *
16 * * *
17 * * *
18 * * *
19 * * *
20 * * *
21 * * *
22 * * *
23 * * *
24 * * *
25 * * *
26 * * *
27 * * *
28 * * *
29 * * *
30 * * *



Measuring Packet Loss

One of the scourgesof networking is packet loss, or dropped packets. Packet loss occurswhen packets fail to reach their destination, causing delays andretransmissions on the client side. Packet loss can result from avariety of problems, including the following:

  • Broken cables

  • Loose connections

  • Failing network interface

  • Overloaded network interface

  • Malfunctioning switch, firewall, or router

  • Bad routing information

  • Electromagnetic interference on a cable

You can usepingto detect dropped packets, though for low loss rates you may need torun it for quite a while before you actually see evidence of droppedpackets. The following output fromping is indicative of a packet loss problem:

bash$ ping -sn 192.168.5.1 56 10
PING 192.168.5.1 (192.168.5.1): 56 data bytes
64 bytes from 192.168.5.1: icmp_seq=0. time=6. ms
64 bytes from 192.168.5.1: icmp_seq=1. time=5. ms
64 bytes from 192.168.5.1: icmp_seq=3. time=5. ms
64 bytes from 192.168.5.1: icmp_seq=4. time=5. ms
64 bytes from 192.168.5.1: icmp_seq=5. time=5. ms
64 bytes from 192.168.5.1: icmp_seq=6. time=5. ms
64 bytes from 192.168.5.1: icmp_seq=8. time=6. ms
64 bytes from 192.168.5.1: icmp_seq=9. time=5. ms

----192.168.5.1 PING Statistics----
10 packets transmitted, 8 packets received, 20% packet loss
round-trip (ms) min/avg/max = 5/5/6

In this output,icmp_seq2 and 7 are missing—they either did not reach 192.168.5.1 or theresponses never made it back. The resulting 20% packet loss is verysignificant; but, in general, you shouldn't tolerate any packet loss.

Ifyou experience dropped packets on one of your WAN interfaces, like a T1to the Internet, call your provider immediately to have it diagnose theproblem. If the problem is internal to your own network, make sure allof your cables and connections are secure before moving onto your routers and switches.

Measuring Network Errors

When a system detectsan error in network communications to or from one of its interfaces, itupdates counters in the kernel. You can display these counters withnetstat-ni. On Solaris the output looks like this:

Code View:Scroll/Show All
bash$ netstat -ni
Name Mtu Net/Dest Address Ipkts Ierrs Opkts Oerrs Collis Queue
lo0 8232 127.0.0.0 127.0.0.1 246 0 246 0 0 0
hme0 1500 192.168.1.0 192.168.1.27 291389 3 17460 0 0 0



Ierrs are input errors andOerrs are output errors. Red Hat Linux displays the counters a bit differently, as follows:

bash$ netstat -ni
Kernel Interface table
eth0 Link encap:Ethernet HWaddr 00:10:B5:96:0A:5E
inet addr:192.168.1.66 Bcast:192.168.1.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:12026963 errors:3 dropped:0 overruns:0 frame:0
TX packets:5713105 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
Interrupt:16 Base address:0x1000

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:3924 Metric:1
RX packets:100713 errors:0 dropped:0 overruns:0 frame:0
TX packets:100713 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0

The fast Ethernet interfacehme0on the Solaris server (the first example) had three input errors out of291,389 total packets received by the interface, for an error rate of0.001% (one error per 100,000 packets). However, you have no way ofdetermining when these errors happened. To avoid this problem, you maywant to run this command with an interval in seconds as the lastargument to view up-to-date statistics every few seconds, much likevmstat.

Don't Accept Errors on Ethernet

Although you can expect a very small error rate on serial interfaces, similar to that of a T1, Ethernet interfaces on local area networks should very rarely (if ever) experience network errors. If you do see errors on an Ethernet interface, check all cables and connections to make sure everything is working as it should.


Network errors are caused by a variety of factors, including the following:

  • Malformed packets

  • Failing interface cards or ports

  • Bad cables

  • Duplex mismatches

Buy a Cable Tester

Punctures in the cable shield, loose pin connections, and poorly twisted pairs all take their toll on a signal traveling across that cable. Use a cable tester to verify that a cable is functioning properly; you can purchase one at many retail computer stores and catalogs.


Physics can also play a role in network errors; interference, crosstalk, and signal degradation are common afflictions on many networks.

Interference occurs whenan electromagnetic signal from a device such as a monitor causes asignal to be created in a nearby cable. This signal interferes with thenormal signals sent across the cable.

Crosstalk is a type of interference that occurs between two wires transmitting different signals in the same cable. Crosstalk is a problem in cables without twisted-pair wiring.

Twisted-Pair Wiring

Category 5 and 6 cabling, used for Ethernet networking, twists pairs of wires in each cable. Twisting reduces the strength of any interference caused in each pair, especially crosstalk.


Signal degradation is the gradual weakening of asignal as it travels across the length of a cable. The signal canbecome so weak that it is not recognized at its destination. Thenatural resistance of copper wiring and external interference are themajor contributors to signal degradation.

Maximum Cable Lengths

All network transmission protocols (such as 100Mbps Ethernet) specify the maximum length for supported cables; signals can't be guaranteed to transmit clearly in cables beyond that length. For example, 100Mbps Ethernet has a limit of 100 meters over Category 5 cabling. Compare that with Gigabit Ethernet over fiber optic cabling, which has a limit of 2 kilometers! This incredible distance capability is due to the lack of interference and signal degradation experienced by fiber optical cables.


Always Check Cables First

Many unexplainable network errors result from bad cables. Before attempting to diagnose any problem involving network errors, check cables to make sure they are connected properly, are not bent, and do not interleave with power cables. When possible, attempt to correct problems by replacing cables before embarking on a lengthy debugging effort.


Retransmittingthe packets that were received with errors usually takes care of theseproblems, but excessive retransmissions can cause serious performanceproblems on your network. Therefore, although any examination of aparticular network interface is bound to reveal some historical errors,your main concern in error monitoring is to track the current rate atwhich these errors are occurring.

Real-World Example: Errors on Ethernet

A company had recently been getting complaints from customers regarding the company's Web site. The customer's browsers were hanging sporadically when connecting to the site, at random times and for random URLs. The Web server logs showed no record of a connection from clients during the times reported, so the system administrator suspected a network problem. Using SNMP to query the network interface statistics on the firewall in front of the Web servers revealed an input error rate of 0.1%, or an error in 1 out of every 1,000 packets, on the company's 100Mbps Fast Ethernet link to the Internet. That's a high error rate for any interface. After the ISP confirmed that there were no network problems on its end, the company's system administrator decided to try rebooting the firewall. Amazingly enough, the error rate shot back down to 0%, and the customer complaints stopped. In this case, a moderate error rate didn't cause a major outage, but it was enough to cause some customer hassles with dropped packets. If the administrator had set up a monitor to proactively watch error rates over time (see the next section for this type of monitor), he could have caught this well before the majority of users had a chance to complain.


Measuring Collisions

A collision occurswhen two machines on a shared network medium attempt to transmit dataat the same time. Each machine detects the collision and retransmitsthe packet after waiting for a random interval to reduce the chancethat the collision will reoccur. Collisions rarely occur on switchedEthernet networks because each port sees only traffic destined for themachine on that port. Collisions, however, happen periodically on hubs,where traffic is broadcast to all ports.

Onan unswitched network like that of an Ethernet hub, you should considera collision rate of over 15% severe enough to warrant furtherinvestigation. The common solution is to move to a switched network orto reduce the number of nodes on the unswitched network.

Acollision rate of over 5% on a switched network is cause for worry.Fortunately, the usual cause of collisions on a switched network is aduplex mismatch, which you learn about in the following section.

Duplex Considerations

A connection is full-duplex if both sides can transmit data at the same time, and half-duplexif only one side can transmit at once. Full-duplex networks provide thelowest latency, because they enable concurrent transmissions. If bothsides of a half-duplex connection transmit at once, a collision occurs;this is the only reason you should see collisions on a switchednetwork. Most modern Ethernet hardware supports full-duplex mode, soyou should take advantage of it wherever possible.

Most Ethernet hardware also supports autonegotiation,which automatically determines the speed and duplex of a connectionbetween two network interfaces. Unfortunately, autonegotiation was oneof the last specifications put into the Ethernet standard, so manydevices do not support it correctly. Incorrect autonegotiation supportcan result in one side thinking the connection is full-duplex while theother side thinks it is half-duplex, causing errors and collisions oneach interface.

You can see the duplex mode of an interface on Solaris by querying thelink_mode of an interface usingndd; 0 means half-duplex, and 1 means full-duplex, as follows:

# ndd /dev/hme link_mode
1

On Free/Net/OpenBSD, the duplex is listed inifconfig output.

bash$ ifconfig xl0
xl0: flags=8843 mtu 1500
media: Ethernet 100baseTX full-duplex
status: active
inet 192.168.1.4 netmask 0xffffff00 broadcast 192.168.1.255

On RedHat Linux, duplex configuration is driver-specific, and there is nostandard interface for querying duplex status. The best way to figureout the duplex of an interface in this case is to rundmesg | grep-i duplex, which displays buffered kernel messages containing the word “duplex.” For example,

$ dmesg | grep -I duplex
eth0: Setting full-duplex based on MII#1 link partner capability of 45e1.

Tuning Network Performance

Even the most experiencednetwork professionals struggle to master all of the techniques ofnetwork performance tuning. Even so, just performing a few basic tuningtasks can greatly increase the performance and reliability of yournetwork communications. The following sections discuss some of the mostbasic network performance tuning tasks.

Hard-coding Duplex Modes

As you learnedin the previous section, full-duplex communications can decreaselatency on a network connection by allowing simultaneous two-waycommunications between two interfaces. However, full-duplex can be usedonly in switched environments where there is only one peer on the samenetwork segment. Machines connected to hubs are relegated tohalf-duplex mode.

Youshould not let autonegotiation decide the duplex for you in aproduction environment; it is just too unreliable. Duplex mismatchescaused by botched autonegotiation are not uncommon in multiplearchitecture environments, and they can cause network errors and highlatency—problems you want to avoid at all costs. The solution is tohard-code the proper duplex mode into your switches and servers todisable autonegotiation altogether.

On a Cisco switch running IOS 12.0, you can hard-code the duplex as follows:

switch#conf t
Enter configuration commands, one per line. End with CNTL/Z.
switch(config)#int Fa0/16
switch(config-if)#duplex full
switch(config-if)#

Solarisis a bit more complicated (as usual): You must disable all of the otherpossible modes and enable only the speed and duplex combinations thatyou want. You can accomplish this in/etc/system, as follows; the disabling takes effect upon reboot (you can usendd to perform these operations without rebooting, but the settings will not persist across reboots):

# Full Duplex 100 Mb
set hme:hme_adv_autoneg_cap=0
set hme:hme_adv_100fdx_cap=1

# Half Duplex 100 Mb
set hme:hme_adv_autoneg_cap=0
set hme:hme_adv_100fdx_cap=0
set hme:hme_adv_100hdx_cap=1

# Full Duplex 10 Mb
set hme:hme_adv_autoneg_cap=0
set hme:hme_adv_100fdx_cap=0
set hme:hme_adv_100hdx_cap=0
set hme:hme_adv_10fdx_cap=1
set hme:hme_adv_10hdx_cap=0

# Half Duplex 10 Mb
set hme:hme_adv_autoneg_cap=0
set hme:hme_adv_100fdx_cap=0
set hme:hme_adv_100hdx_cap=0
set hme:hme_adv_10fdx_cap=0
set hme:hme_adv_10hdx_cap=1

You can configure a NIC in OpenBSD to be full duplex withifconfig, as follows, but only if your card supports it.

# ifconfig xl0 media 100baseTX mediaopt full-duplex

In order to persist across reboots, you must also configure this in/etc/hostname.INT, whereINT is the name of your network interface. For example, in/etc/hostname.xl0,

Code View:Scroll/Show All
inet 10.1.1.1 0xffffff00 10.1.1.255 media '100baseTX' mediaopt 'full-duplex'



Thesettings for Linux NICs are vendor-dependent, and there is no genericmethod to hard-code the duplex. See your NIC vendor's documentation for more details.

Prioritizing Important Traffic

A WAN connectionlike a T1 or T3 is usually not limited to carrying just one protocol. Atypical T1 to the Internet might carry well over ten individualInternet protocols at any one time during peak usage, including HTTP,SMTP, and DNS. A T3 between two data centers might carry moreapplication-specific traffic, such as Oracle SQL*Net or NFS. In eithercase some types of traffic are probably more important than others, andyou should prioritize them to ensure their timely transport.

Ingeneral, you can prioritize network traffic into four categories. Thehighest priority traffic is usually that of interactive applications,such as Telnet or SSH, as their users expect speedy responses everytime a key is pressed. These are also the protocols you use to log into remote servers, so it is important that they function efficiently atall times.

Anothertype of traffic is that of supporting your technology infrastructure,including DNS. Without DNS, your servers and others querying your nameservers cannot resolve names into IP addresses, and all communicationsto named hosts will stall. These types of protocols should beprioritized just below interactive traffic.

Nextin the priority list are your application protocols. If running OracleSQL queries over a WAN connection is critical to your applications, youshould prioritize SQL*Net on port 1521. Below these applicationprotocols lies everything else—protocols that aren't important to yourorganization or that aren't time-sensitive. Routers usually lumpunspecified protocols into this default pool.

Prioritize Traffic on the Correct Router

On a private WAN link such as a T1 between your office and data center, priorities should be set on the router that sends most of the traffic. This ensures that no traffic crosses the WAN link before it should.


An e-commerce company with several Web applications might prioritize its traffic as follows:

  1. Telnet(TCP 23), SSH(TCP 22)

  2. DNS(TCP,UDP 53), SNMP(UDP 161)

  3. HTTP(TCP 80), HTTPS(TCP 443)

  4. Other traffic

This order would be represented on a Cisco router running IOS 12.0 as follows:

queue-list 1 protocol ip 1 tcp 22
queue-list 1 protocol ip 1 tcp telnet
queue-list 1 protocol ip 2 udp domain
queue-list 1 protocol ip 2 tcp domain
queue-list 1 protocol ip 2 tcp 161
queue-list 1 protocol ip 3 tcp www
queue-list 1 protocol ip 3 tcp 443
queue-list 1 default 4
queue-list 1 queue 1 limit 40
queue-list 1 queue 2 limit 80
queue-list 1 queue 3 byte-count 2000 limit 120
queue-list 1 queue 4 byte-count 3000 limit 160

Real-World Example: Sluggish SSH

Programmers at a company with a T1 line from the office to the data center complained to the system administrator that they were not able to log in via SSH to any of the servers at the data center. Upon further investigation, the administrator saw almost 100% utilization on the T1, all in Oracle SQL*Net traffic on port 1521. A long-running query returning hundreds of megabytes of data was clogging the T1, eclipsing all other traffic with sheer volume, including SSH. The administrator quickly remedied the problem by setting up priorities on the router to give the interactive SSH protocol priority over other bandwidth-hungry protocols like SQL*Net.


Tuning TCP Timers

TCP is a statefulprotocol, and as such each side is responsible for maintaining theconnection with the other side. Various timers in the Unix kernel areemployed to assist in this process, and tuning some of them can providea big boost in capacity and performance. The Keepalive and TIME_WAITtimers are two of the most commonly used of these devices, as discussedin the sections that follow.

Keepalive Timer

If a client crashes oris otherwise interrupted with an open TCP connection to a server, theconnection is not immediately broken because the client did notexplicitly close it. This connection would remain open indefinitely ifnot for the keepalive timer, which specifies how long these one-sided connections can remain open.

OnSolaris the default value of the keepalive timer is 7,200,000milliseconds, or 2 hours. On a busy server with thousands of clients,if each broken connection warrants a 2-hour wait for its port to bereclaimed, the server may run out of available ports, causing newincoming connections to be refused. A more reasonable value on a busyserver like this would be 5 minutes. You can usendd to set the timer in milliseconds, as follows:

# ndd –set /dev/tcp tcp_keepalive_interval 300000

TIME_WAIT Interval

The TIME_WAIT interval determinesthe amount of time the kernel will wait before reusing a port from aclosed TCP connection. The reason behind this delay is dataintegrity—what happens if the server closes the connection before theclient is ready? The client may still be in the process of sending datato the server; if the server port were reused, that data might go tothe wrong process, causing all kinds of trouble.

TheTIME_WAIT interval is a good thing, but it also creates an interestingproblem. If a large number of connections arrive at a server and areclosed successfully, the ports associated with those connections mustwait to be reused for the TIME_WAIT interval. Considering the defaultinterval is 4 minutes on Solaris, this is a long time to wait forhundreds or even thousands of a limited number of ports to be reused.You can tune TIME_WAIT to 1 or 2 minutes withndd, as follows:

# ndd –set /dev/tcp tcp_time_wait_interval 60000

Planning for Future Network Capacity

Network capacity isone the most limiting factors in any technology environment, due to thelarge amount of infrastructure needed to implement it. For example,just a single T1 from your office to the Internet requires a physicalcircuit (a PRI line) from the local telephone company, CSU/DSU units totranslate the T1 signal, a router to connect the CSU/DSU units to yournetwork, and an ISP to provide Internet service at the other end of thecircuit.

All of thiseffort and its associated cost gets you 1.544Mbps of bandwidth to theInternet. If you suddenly need more, you'll have to repeat this arduousprocess for another T1 or possibly higher bandwidth line. There areseveral measures you can take to make sure your organization has enoughflexibility in its bandwidth solutions as well as the foresight topredict problems before they arise.

Watch Long-Term Trends in Network Traffic

Trend analysisis an important part of system administration, and nowhere is this moreobvious than in network planning. A small company with 16 employeesmight start out with a single T1 to the Internet for Web and emailfunctionality, realizing that their average utilization will rarelyapproach the 1.544Mbps capacity of the line. The company's usage willgrow with the company, however, and eventually that same 1.544Mbps willbe a bottleneck for the company's communications. The astute systemadministrator monitors the bandwidth utilization over time andtherefore can predict the approximate point at which more bandwidth isnecessary.

Figure 11.4shows an example of one year's worth of bandwidth utilization on afractional 768Mbps T1 line. Since late January, utilization hassteadily increased (when read right to left), but is still nowhere nearthe capacity of the T1 line (98KBps). However, now that the systemadministrator knows that utilization is increasing, he or she can watchthe graph in the next few months to see if that trend continues andrecommend more bandwidth before the situation becomes critical. If thisincrease in utilization can be correlated with other factors—such asnew hires—the administrator can predict future bandwidth needs simply by knowing how many people are being hired.

Figure 11.4. An MRTG graph representing one year of bandwidth utilization measurements on a fractional T1 line.


Average Bandwidth Utilization

The average bandwidth utilization on a circuit should be no more than 30%, with peak usage not exceeding 70%. Beyond this, spikes in usage can bring a circuit dangerously close to capacity and degrade the performance of other network traffic on the line.


Use Variable-Bandwidth Circuits

With the increasinguse of fiber and other high-capacity media, many ISPs are able to offercustomers variable-bandwidth solutions, even to their offices.Variable-bandwidth circuits usually offer a set bandwidth to which youagree to limit the majority of your utilization, but the circuits arenot physically capped at that limit; you are allowed to “burst” up to amuch higher maximum rate for short periods of time. This leeway allowsfor spikes in traffic; the circuits are therefore called burstable.

Burstablebandwidth is very convenient because it offers you the capacity of ahigh-bandwidth circuit like a DS-3 for just a little more than the costof a lower-capacity circuit like a T1. For example, if your expectedaverage utilization on a new T1 is less than 25%, but you occasionallyneed to transfer files several hundred megabytes in size in less than 5minutes, you should probably consider burstable bandwidth from yourISP. Many ISPs will charge by bandwidth per interval. For example, on aburstable T1 you might be charged 1 dollar for every 5-minute intervalthat your utilization exceeds 1.544Mbps.

Summary

Performancetuning and capacity planning are two vast disciplines that go hand inhand, and every system administrator should have a basic understandingof the concepts within each of them. These complex responsibilities aremuch easier to understand and manage if you categorize them accordingto the four major subsystems on a server: CPU, storage, memory, andnetworking. To manage performance tuning and capacity planning for yournetwork, you first must understand what metrics you must compile andhow to determine what they tell you about your particular operatingsystem. With this data, you can know what to expect from the normalbehavior of your systems. When abnormalities arise, you're then betterable to see what system components need to be tuned. You also recognizetrends that can help you plan for the future. This chapter merelyskimmed the surface of performance tuning and capacity planning,providing a high-level view of very complex topics; several bookslisted in Appendix A are dedicated to these subjects and can provide more details than you will ever need.