[Hortiwz02] Section 11.10 Gauging Network Performance and Capacity

来源:百度文库 编辑:神马文学网 时间:2024/03/28 22:04:10

Gauging Network Performance and Capacity

Fine-tuning CPU usage,memory, swap, and disk performance will do you no good if your serverscannot be reached over the network. System administrators often takeconnectivity and bandwidth for granted, assuming one machine will beable to talk to another in a timely fashion.

Systemadministrators must plan network capacity and optimize networkperformance, just as they do any other system resource. But the networkoffers special challenges, because it is a shared medium used by all ofthe systems connected to it. To optimize network performance, systemadministrators must learn to gauge specific performance metrics andunderstand what factors can affect them.

Bandwidth Versus Latency

Network performance isgoverned by two major factors: bandwidth and latency. In previoussections of this chapter, you learned that bandwidth is the maximumdata transfer rate of a connection, usually expressed in bits persecond. Latency is the delay between the time a packet is sent from itssource and the time it is received by the destination. Latency isindependent of the bandwidth of the connection. Many systemadministrators incorrectly gauge the performance of their network byits bandwidth alone, without considering latency.

pingcan be used as a “quick and dirty” method to test the latency of a connection.ping measures the round-trip time of the ICMP echo requests it sends to the destination hosts. In one case, usingpingto test the latency on two T1 lines in an office yielded somesurprising results. The T1 lines were from different providers; one wasa full 1.544Mb/s line while the other was a fractional T1 at 768Mb/s,half the bandwidth of the full T1. Latency was tested by pinging therouter on the other side of each T1 line once per second for tenseconds. The tests produced the following results (addresses changedfor privacy):

Code View:Scroll/Show All
# TEST FULL T1
bash$ ping -s 192.168.0.5 56 10
PING 192.168.0.5: 56 data bytes
64 bytes from 192.168.0.5: icmp_seq=0. time=6. ms
64 bytes from 192.168.0.5: icmp_seq=1. time=6. ms
64 bytes from 192.168.0.5: icmp_seq=2. time=6. ms
64 bytes from 192.168.0.5: icmp_seq=3. time=6. ms
64 bytes from 192.168.0.5: icmp_seq=4. time=6. ms
64 bytes from 192.168.0.5: icmp_seq=5. time=6. ms
64 bytes from 192.168.0.5: icmp_seq=6. time=6. ms
64 bytes from 192.168.0.5: icmp_seq=7. time=6. ms
64 bytes from 192.168.0.5: icmp_seq=8. time=6. ms
64 bytes from 192.168.0.5: icmp_seq=9. time=6. ms

----192.168.0.5 PING Statistics----
10 packets transmitted, 10 packets received, 0% packet loss
round-trip (ms) min/avg/max = 6/6/6

# TEST FRACTIONAL T1
bash$ ping -s 172.16.0.5 56 10
PING 172.16.0.5: 56 data bytes
64 bytes from 172.16.0.5: icmp_seq=0. time=4. ms
64 bytes from 172.16.0.5: icmp_seq=1. time=3. ms
64 bytes from 172.16.0.5: icmp_seq=2. time=4. ms
64 bytes from 172.16.0.5: icmp_seq=3. time=3. ms
64 bytes from 172.16.0.5: icmp_seq=4. time=3. ms
64 bytes from 172.16.0.5: icmp_seq=5. time=3. ms
64 bytes from 172.16.0.5: icmp_seq=6. time=3. ms
64 bytes from 172.16.0.5: icmp_seq=7. time=3. ms
64 bytes from 172.16.0.5: icmp_seq=8. time=3. ms
64 bytes from 172.16.0.5: icmp_seq=9. time=3. ms

----172.16.0.5 PING Statistics----
10 packets transmitted, 10 packets received, 0% packet loss
round-trip (ms) min/avg/max = 3/3/4



Linuxping

The Linux version of ping sends one ICMP echo request every second by default, and does not require the -s flag.


The full T1 had an average round-trip time of 6 secondswhile the fractional T1, with half the bandwidth of the full T1, had anaverage round-trip time of 3 seconds. The packets sent over thefractional T1 made the trip in half the time it took the packets totravel on the full T1. These results indicated that latency operatedindependently of bandwidth in this situation. This outcome is notalways the case, though; on very congested lines, bandwidth limitationsmay delay packets from reaching their destinations, increasing latency.

Latencyon the order of a few milliseconds is nothing to be concerned about,but when it begins to rise closer to 100 milliseconds, your networkcommunications will begin to experience significant delays andinterruptions. These problems are especially noticeable in interactiveapplications such as Telnet or SSH, where you expect a character toappear almost immediately after you press a key on your keyboard.

Modem Latency

The average cable modem or DSL broadband connection has latency on the order of tens of milliseconds, while traditional analog modems have latencies in the hundreds of milliseconds. This is one reason why Web surfing over a modem seems so slow.


Many factors contribute to latency, but the usual suspects are the following:

  • Distance

  • Physical medium (such as optical fiber versus copper)

  • Number of hops to the destination

  • Router load

  • Router queuing priorities

Counting Hops

One of the factors that contributes to latency is the number of hops between a packet's source and its destination. A hopis simply a router between the source host and the destination host.The destination host is included in the number of hops, so the numberof hops between two hosts on the same network is always one (assumingno bridging). Hops are the units of logical distance on a network, aseach hop introduces additional latency to a connection.

Every packet has a TTL (time to live),which is decremented at each hop. If the TTL reaches zero on a router,the router refuses to route the packet any further and returns an ICMPTime Exceeded control message back to the source host. This controlprevents infinite routing loops, which would quickly overload amisconfigured router.

Too many hops can add to latency, so it is often useful to count the number of hops to certain sites usingtraceroute.

bash$ /usr/sbin/traceroute -n 192.168.5.6
traceroute to 192.168.5.6 (192.168.5.6), 30 hops max, 40 byte packets
1 192.168.1.2 3.711 ms 3.331 ms 2.609 ms
2 192.168.10.5 5.693 ms 5.715 ms 5.670 ms
3 192.168.6.5 5.615 ms 5.534 ms 5.516 ms
4 192.168.5.6 5.975 ms 5.837 ms 5.746 ms

Therewere four hops from the source host to the destination 192.168.5.6. Thetimes reported to the right of each hop are round-trip times for eachof three “probes” thattraceroute sends to the destination.

Devices are not required to answer requests fromping ortraceroute. Most firewalls silently drop these packets, resulting in some skipped hops in the output oftraceroute.

Code View:Scroll/Show All
$ /usr/sbin/traceroute -n 207.8.173.37
traceroute to 207.8.173.37 (207.8.173.37), 30 hops max, 40 byte packets
1 192.168.1.2 3.030 ms 2.796 ms 2.864 ms
2 208.172.25.1 10.044 ms 9.878 ms 9.985 ms
3 208.172.18.62 10.673 ms 10.508 ms 10.487 ms
4 206.24.194.61 13.901 ms 14.281 ms 12.099 ms
5 206.24.195.226 12.928 ms 13.147 ms 11.980 ms
6 144.232.7.105 12.391 ms 12.928 ms 12.361 ms
7 144.232.9.226 63.675 ms 16.622 ms 198.113 ms
8 144.232.14.138 14.910 ms 16.768 ms 14.996 ms
9 144.232.14.42 15.904 ms 15.791 ms 16.542 ms
10 160.81.19.254 15.841 ms 16.488 ms 16.265 ms
11 207.106.31.34 17.621 ms 16.738 ms 16.431 ms
12 207.8.128.93 21.685 ms 20.174 ms 20.364 ms
13 * * *
14 * * *
15 * * *
16 * * *
17 * * *
18 * * *
19 * * *
20 * * *
21 * * *
22 * * *
23 * * *
24 * * *
25 * * *
26 * * *
27 * * *
28 * * *
29 * * *
30 * * *



Measuring Packet Loss

One of the scourgesof networking is packet loss, or dropped packets. Packet loss occurswhen packets fail to reach their destination, causing delays andretransmissions on the client side. Packet loss can result from avariety of problems, including the following:

  • Broken cables

  • Loose connections

  • Failing network interface

  • Overloaded network interface

  • Malfunctioning switch, firewall, or router

  • Bad routing information

  • Electromagnetic interference on a cable

You can usepingto detect dropped packets, though for low loss rates you may need torun it for quite a while before you actually see evidence of droppedpackets. The following output fromping is indicative of a packet loss problem:

bash$ ping -sn 192.168.5.1 56 10
PING 192.168.5.1 (192.168.5.1): 56 data bytes
64 bytes from 192.168.5.1: icmp_seq=0. time=6. ms
64 bytes from 192.168.5.1: icmp_seq=1. time=5. ms
64 bytes from 192.168.5.1: icmp_seq=3. time=5. ms
64 bytes from 192.168.5.1: icmp_seq=4. time=5. ms
64 bytes from 192.168.5.1: icmp_seq=5. time=5. ms
64 bytes from 192.168.5.1: icmp_seq=6. time=5. ms
64 bytes from 192.168.5.1: icmp_seq=8. time=6. ms
64 bytes from 192.168.5.1: icmp_seq=9. time=5. ms

----192.168.5.1 PING Statistics----
10 packets transmitted, 8 packets received, 20% packet loss
round-trip (ms) min/avg/max = 5/5/6

In this output,icmp_seq2 and 7 are missing—they either did not reach 192.168.5.1 or theresponses never made it back. The resulting 20% packet loss is verysignificant; but, in general, you shouldn't tolerate any packet loss.

Ifyou experience dropped packets on one of your WAN interfaces, like a T1to the Internet, call your provider immediately to have it diagnose theproblem. If the problem is internal to your own network, make sure allof your cables and connections are secure before moving onto your routers and switches.

Measuring Network Errors

When a system detectsan error in network communications to or from one of its interfaces, itupdates counters in the kernel. You can display these counters withnetstat-ni. On Solaris the output looks like this:

Code View:Scroll/Show All
bash$ netstat -ni
Name Mtu Net/Dest Address Ipkts Ierrs Opkts Oerrs Collis Queue
lo0 8232 127.0.0.0 127.0.0.1 246 0 246 0 0 0
hme0 1500 192.168.1.0 192.168.1.27 291389 3 17460 0 0 0



Ierrs are input errors andOerrs are output errors. Red Hat Linux displays the counters a bit differently, as follows:

bash$ netstat -ni
Kernel Interface table
eth0 Link encap:Ethernet HWaddr 00:10:B5:96:0A:5E
inet addr:192.168.1.66 Bcast:192.168.1.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:12026963 errors:3 dropped:0 overruns:0 frame:0
TX packets:5713105 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
Interrupt:16 Base address:0x1000

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:3924 Metric:1
RX packets:100713 errors:0 dropped:0 overruns:0 frame:0
TX packets:100713 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0

The fast Ethernet interfacehme0on the Solaris server (the first example) had three input errors out of291,389 total packets received by the interface, for an error rate of0.001% (one error per 100,000 packets). However, you have no way ofdetermining when these errors happened. To avoid this problem, you maywant to run this command with an interval in seconds as the lastargument to view up-to-date statistics every few seconds, much likevmstat.

Don't Accept Errors on Ethernet

Although you can expect a very small error rate on serial interfaces, similar to that of a T1, Ethernet interfaces on local area networks should very rarely (if ever) experience network errors. If you do see errors on an Ethernet interface, check all cables and connections to make sure everything is working as it should.


Network errors are caused by a variety of factors, including the following:

  • Malformed packets

  • Failing interface cards or ports

  • Bad cables

  • Duplex mismatches

Buy a Cable Tester

Punctures in the cable shield, loose pin connections, and poorly twisted pairs all take their toll on a signal traveling across that cable. Use a cable tester to verify that a cable is functioning properly; you can purchase one at many retail computer stores and catalogs.


Physics can also play a role in network errors; interference, crosstalk, and signal degradation are common afflictions on many networks.

Interference occurs whenan electromagnetic signal from a device such as a monitor causes asignal to be created in a nearby cable. This signal interferes with thenormal signals sent across the cable.

Crosstalk is a type of interference that occurs between two wires transmitting different signals in the same cable. Crosstalk is a problem in cables without twisted-pair wiring.

Twisted-Pair Wiring

Category 5 and 6 cabling, used for Ethernet networking, twists pairs of wires in each cable. Twisting reduces the strength of any interference caused in each pair, especially crosstalk.


Signal degradation is the gradual weakening of asignal as it travels across the length of a cable. The signal canbecome so weak that it is not recognized at its destination. Thenatural resistance of copper wiring and external interference are themajor contributors to signal degradation.

Maximum Cable Lengths

All network transmission protocols (such as 100Mbps Ethernet) specify the maximum length for supported cables; signals can't be guaranteed to transmit clearly in cables beyond that length. For example, 100Mbps Ethernet has a limit of 100 meters over Category 5 cabling. Compare that with Gigabit Ethernet over fiber optic cabling, which has a limit of 2 kilometers! This incredible distance capability is due to the lack of interference and signal degradation experienced by fiber optical cables.


Always Check Cables First

Many unexplainable network errors result from bad cables. Before attempting to diagnose any problem involving network errors, check cables to make sure they are connected properly, are not bent, and do not interleave with power cables. When possible, attempt to correct problems by replacing cables before embarking on a lengthy debugging effort.


Retransmittingthe packets that were received with errors usually takes care of theseproblems, but excessive retransmissions can cause serious performanceproblems on your network. Therefore, although any examination of aparticular network interface is bound to reveal some historical errors,your main concern in error monitoring is to track the current rate atwhich these errors are occurring.

Real-World Example: Errors on Ethernet

A company had recently been getting complaints from customers regarding the company's Web site. The customer's browsers were hanging sporadically when connecting to the site, at random times and for random URLs. The Web server logs showed no record of a connection from clients during the times reported, so the system administrator suspected a network problem. Using SNMP to query the network interface statistics on the firewall in front of the Web servers revealed an input error rate of 0.1%, or an error in 1 out of every 1,000 packets, on the company's 100Mbps Fast Ethernet link to the Internet. That's a high error rate for any interface. After the ISP confirmed that there were no network problems on its end, the company's system administrator decided to try rebooting the firewall. Amazingly enough, the error rate shot back down to 0%, and the customer complaints stopped. In this case, a moderate error rate didn't cause a major outage, but it was enough to cause some customer hassles with dropped packets. If the administrator had set up a monitor to proactively watch error rates over time (see the next section for this type of monitor), he could have caught this well before the majority of users had a chance to complain.


Measuring Collisions

A collision occurswhen two machines on a shared network medium attempt to transmit dataat the same time. Each machine detects the collision and retransmitsthe packet after waiting for a random interval to reduce the chancethat the collision will reoccur. Collisions rarely occur on switchedEthernet networks because each port sees only traffic destined for themachine on that port. Collisions, however, happen periodically on hubs,where traffic is broadcast to all ports.

Onan unswitched network like that of an Ethernet hub, you should considera collision rate of over 15% severe enough to warrant furtherinvestigation. The common solution is to move to a switched network orto reduce the number of nodes on the unswitched network.

Acollision rate of over 5% on a switched network is cause for worry.Fortunately, the usual cause of collisions on a switched network is aduplex mismatch, which you learn about in the following section.

Duplex Considerations

A connection is full-duplex if both sides can transmit data at the same time, and half-duplexif only one side can transmit at once. Full-duplex networks provide thelowest latency, because they enable concurrent transmissions. If bothsides of a half-duplex connection transmit at once, a collision occurs;this is the only reason you should see collisions on a switchednetwork. Most modern Ethernet hardware supports full-duplex mode, soyou should take advantage of it wherever possible.

Most Ethernet hardware also supports autonegotiation,which automatically determines the speed and duplex of a connectionbetween two network interfaces. Unfortunately, autonegotiation was oneof the last specifications put into the Ethernet standard, so manydevices do not support it correctly. Incorrect autonegotiation supportcan result in one side thinking the connection is full-duplex while theother side thinks it is half-duplex, causing errors and collisions oneach interface.

You can see the duplex mode of an interface on Solaris by querying thelink_mode of an interface usingndd; 0 means half-duplex, and 1 means full-duplex, as follows:

# ndd /dev/hme link_mode
1

On Free/Net/OpenBSD, the duplex is listed inifconfig output.

bash$ ifconfig xl0
xl0: flags=8843 mtu 1500
media: Ethernet 100baseTX full-duplex
status: active
inet 192.168.1.4 netmask 0xffffff00 broadcast 192.168.1.255

On RedHat Linux, duplex configuration is driver-specific, and there is nostandard interface for querying duplex status. The best way to figureout the duplex of an interface in this case is to rundmesg | grep-i duplex, which displays buffered kernel messages containing the word “duplex.” For example,

$ dmesg | grep -I duplex
eth0: Setting full-duplex based on MII#1 link partner capability of 45e1.

  • Create Bookmark (Key: b)Create Bookmark
  • Create Note or Tag (Key: t)Create Note or Tag
  • Download (Key: d)Download
  • Email This Page (Key: e)Email This Page
  • PrintPrint
  • Html View (Key: h)Html View
  • Zoom Out (Key: -)Zoom Out
  • Zoom In (Key: +)Zoom In
  • Toggle to Full Screen (Key: f)
  • Previous (Key: p)Previous
  • Next (Key: n)Next

Related Content

traceroute
From: Networking Linux®: A Practical Guide to TCP/IP

Network Diagnostics Tools
From: Running Linux, 5th Edition

Using traceroute to Test Connectivity
From: Linux® Quick Fix Notebook

traceroute: trace IP packets
From: UNIX® System Administration Handbook, Third Edition

Troubleshoot Network Connections with ping, tracert, and pathping
From: Windows XP Hacks, 2nd Edition

Intro, intro — Introduction to Maintenance Commands and Application Programs
From: Solaris™ 8 System Administrator's Reference

Testing Your LAN
From: Unix Advanced: Visual QuickPro Guide

Trace the Route Packets Take Between Two Hosts
From: Linux: Phrasebook

Troubleshooting from the Command Line
From: Microsoft® Windows 7 Unleashed

I’m Sold. Where Do I Get Them?
From: Managing RPM-Based Systems with Kickstart and Yum