[Horwitz02] Chapter 7. Patches, Upgrades, and Decommissions

来源:百度文库 编辑:神马文学网 时间:2024/05/11 11:24:56

Chapter 7. Patches, Upgrades, and Decommissions

You will learn about the following in this chapter:

  • Creating a sandbox environment for testing system changes

  • How to handle common problems when applying patches

  • Managing operating system upgrades

  • Managing hardware upgrades

  • Managing firmware upgrades

  • Best practices for decommissioning services on your Unix network

Nomatter how well a system has run in the past, the need for change in aUnix network is inevitable. Software bugs emerge, new versions ofoperating systems become available, and hardware fails. To keep up withthese events, system administrators add software patches, upgradehardware and software, and decommission systems that have becomeobsolete. Although the actual procedures involved in patching,upgrading, and decommissioning systems are fairly straightforward, anynumber of problems could occur during those processes that can make thesysadmin's life a nightmare. This chapter examines the logistical sideof implementing and managing system and service upgrades on the Unixnetwork, and it offers effective techniques for keeping those peskyproblems at bay.

Preimplementation Testing in a Sandbox Environment

Even minorupgrades and changes to operating systems, applications, and hardwarecan cause major problems with the Unix network's functions. Thoseproblems can wreak havoc on the people using the system, and they cancause you endless headaches when you're busy patching or upgrading aserver. One way system administrators avoid this pain is by creating atest environment strictly for testing new technologies and systemsbefore they're put into place. System administrators refer to thesetesting environments as “sandbox” environments.

Asa system administrator, you can use the sandbox as you see fit; you cantest patches, upgrade to new operating systems, and swap hardware inand out of servers. Best of all, you can do all of this withoutaffecting users in any way. Because you have time and opportunity touncover process bottlenecks, pitfalls, bugs, and other potentialproblems that may accompany the new technology, you can resolve oravoid those issues before they become problems for the entireorganization. While it is not required, you should try to mimic yourproduction environment as closely as possible; this ensures that yourtest results will be representative of the results you would see in production.

Use Identical Operating Systems

If you cannot acquire sandbox hardware that is identical to your production hardware, at the very least load the servers in your sandbox with the same version of the operating system and patch levels as those on your production servers.


Sandbox hardware can behard to come by. In many small organizations, including start-ups,management is often more concerned with getting production servicesonline and staying within very tight budgets. A testing server thatdoesn't serve a very obvious and tangible purpose to management won'tlikely be a high priority. As system administrator, however, you havesome persuasive arguments to be made that might help managementunderstand why the sandbox environment is a priority.

Awell-researched budget is a good place to start. Sandbox hardwareshould reflect the hardware you have in your production environment,but it does not need to be production quality. Make sure you price usedequipment and look at offerings from alternate vendors, who may offerlower prices on certain hardware than your preferred vendor; whenmanagement doesn't have to spend a lot of money to fulfill a request,they are much more likely to view the request in a positive light.

Next,hit them with your production values; without a sandbox environment,you'll have to deploy new operating systems and patches on thecompany's production systems without testing first. Let management knowthat you're concerned about lost time and revenues if something goeswrong in a new deployment. You have no way of predicting everypotential problem without testing. If you can present realisticscenarios of such problems and the potential repercussions of theiroccurrence, management can assess the business benefits of approvingthe sandbox environment.

Inthe best-case scenario, you'll get your servers. In the worst case,you'll have documented evidence of your warnings to management whensomething does go wrong. Subsequent sections in this chapter assumethat you do have a sandbox environment.

Patching Operating Systems

Patches can go bymany names like updates, service packs, and mods, but they're just moreeloquent names for patches. Patches for operating systems (OS) andother software exist for various reasons: Manufacturers fix bugs,implement new features, and perhaps improve performance. Regardless ofthe reason, for the Unix system administrator, patching means makingchanges to critical software, so the administrator needs to apply thosepatches with care.

New Patch Notification

Keeping up with current patches is an important part of your job and is vital to your systems' stability and security. Many vendors offer email notification or post updates to their support sites when new patches are released, especially for critical patches like security fixes. However, one of the best places to find this information is from the system administrator community itself. Subscribe to security mailing lists and bug mailing lists like BUGTRAQ (http://online.securityfocus.com/cgi-bin/sfonline/subscribe.pl); they usually carry information about patches well ahead of formal announcement by software companies.


Best Practices for Patching Operating Systems

Although youcan't guarantee that you won't experience any problems when applyingpatches, you can follow some guidelines for minimizing your risks. Hereare some “best practices” for keeping the patching process astrouble-free as possible:

  • Follow all vendor documentation for applying patches.

  • Examine each patch you are applying; read all online advice and any “README” files that accompany the patch, so you understand the procedures for applying each patch and the risks each patch may pose.

  • Patch in single-user mode if possible, to keep the system in a consistent and predictable state while you install the patch. If single-user mode isn't feasible, kill as many nonessential processes as possible and choose a nonpeak time to patch the system.

  • Many patching processes include an option to save patched files so you can back out of the patch later with a single command (you learn more about this later in the chapter). If you have sufficient disk space, take advantage of this option.

  • Make a full backup of the entire system before applying any patches. Patch back-out mechanisms can't recover files that aren't part of a patch.

  • Use the system's console to apply patches. If the system loses network connectivity, you won't lose your session and possibly interrupt the patching process (see “Overcoming Patch Application Failure” later in this chapter for more on this topic).

  • Test the patching procedure on sandbox hardware first. Verify that the procedure works before applying it to an important system.

  • Take advantage of redundancy by patching only part of a redundant set of servers at one time. If you have two mail servers, patch one at a time; if the patching renders the system unusable, you'll still have one working mail server.

Application Patching

The procedures for patching applications are very specific to each application and how you use it. However, many of the methodologies used in operating system patches can be applied to application patches as well.


Thoughpatch application methods vary from system to system, the risks ofpatching are the same. When you add an OS patch, you're changing thevery software that makes the computer run. That's a step you can't takelightly. Several things can go wrong during the patching. In thissection of the chapter, you learn about three of the most common OSpatching problems: patch application failure, reboot failure, and bug trading.

Overcoming Patch Application Failure

It is impossible forvendors to predict what a machine will be doing while a patchapplication is taking place. Maybe there's an Oracle database runningon the machine that can't be shut down. Perhaps some files that thepatch needs to modify are corrupt. Maybe the system hasn't enough diskspace to complete the patching.

Mostvendors recommend that OS patches be applied in single-user mode, or atleast while the system is in a semi-idle state with a minimum ofprocesses running, but this condition isn't always possible, especiallyon servers with no redundancy. And even when you apply patches in thevendor-recommended mode, vendors can't predict what changes you or yoursoftware has made to the file system, or even to parts of the operatingsystem itself. Any deviation from vendor recommendations—includingunforeseen system changes—can result in a patch application failure.

Patch During Maintenance Windows

If you are unable to take a server down into single-user mode for patching, use your maintenance window to apply the patches. If you timed your window correctly, as described in Chapter 8, “Service Outages,” it should coincide with the most idle time on your system.


Patch application failurescan leave the system in an unknown state. Most of the time a patchapplication fails because of a missing dependency—a Unix system canrecover from this failure very easily. However, if a patch applicationfails during the patching process, some parts of the OS will bechanged, while other parts will remain unchanged. The system's unknownstate makes its operating condition impossible to predict. The systemmay be completely functional, partially functional, or the failed patchapplication may have rendered the system completely unusable. In thelatter case, only a reload of the operating system can repair thedamage, but for most cases the back-out capabilities of your patchingsoftware will resolve the problem for you.

Check Hardware Before Applying Patches

If critical hardware like a disk or CPU fails during a patch application, or if the power fails, patch applications could be stopped in their tracks, resulting in partially applied patches and a potentially corrupted system. Verify the condition of your hardware before you apply any patches, in order to prevent this situation from happening to your servers.


If you suffer apatch application failure, first examine any error messages the systemis displaying, and fix the problems associated with those errors if youcan. Some common errors that can occur during patch applications andtheir remedies are listed here:

  • Inadequate space on a file system Remove unnecessary files from the file system and reapply the patch.

  • A file that needs to be replaced is locked Find the process that owns the lock and kill it; you can then reapply the patch. You can use the fuser command on both Solaris and Red Hat Linux to determine which processes are holding open a particular file.

  • The software you are patching is not installed on your system This is common when you are installing an entire collection of patches at once—you can ignore this error message.

  • A file is missing that needs to be patched Reinstall the software that you are patching and then reapply the patch.

Solaris and RPM Calculate Space Requirements for You

The patching mechanisms on both Solaris and Red Hat Linux (RPM and patchadd, respectively) verify that you have sufficient disk space before making any changes to your system. This precaution minimizes the chance that you will apply only part of a patch due to disk space constraints.


If the patchingoperation continues to fail, contact vendor support to see if thevendor can help you work through the problem with the system itdesigned.

When allelse fails, you must reload the system with a fresh copy of theoperating system and add the patch to the fresh copy. Reloading the OStakes a lot of time and carries the risk of data loss, so make sureyou've exhausted all the other options before you consider reloading.

Beforereloading any system whose stability is questionable, make sure youperform the following tasks to help preserve data and minimize downtime:

  • Back up any important data files stored on local disk. If your backup software is not functional, copy files over the network to another machine (see Chapter 12, “Process Automation,” for instructions on how to do this), or attach a tape drive to copy the files directly to tape.

  • Back up configuration files, especially those in /etc and any application-specific directories. Even if you don't need most of them, it's best to back up everything in case you forget about a particular configuration.

  • Identify locally installed applications that you will need to install again after reloading the system, and verify that you have the installation media.

  • Note the size of each file system and any RAID (Redundant Array of Inexpensive Disks) configurations so they can be reconstructed exactly as they existed before the upgrade. See Chapter 10, “Providing High Availability in Your Unix System,” and Chapter 11, “Performance Tuning and Capacity Planning,” for more information on how to perform these tasks.

Keep a Tape Drive Handy

Even if you have a large tape library that meets all of your backup needs, you should still keep a small standalone tape drive handy for use on individual systems. You will then be able to back up files in emergency situations when the backup software is not functional or your network unavailable.


Recovering from Post-Patch Reboot Failure

Almost allsystems require a reboot to complete the patching process. The rebootruns the new kernel and its associated device drivers and restartspatched daemons. Though you may think you're home free after asuccessful patching application, conflicts can arise during the rebootprocess that can prevent the system from booting at all.

Onecommon cause of post-patch booting problems is the patch's overwritingof critical configuration files. Some patch installations take care notto overwrite existing configuration files, while others aren't soconsiderate. For example, on Solaris, patch 108984-07 includes the file/kernel/drv/qlc.conf;that file configures QLogic fiber channel cards. If the systemadministrator changed this configuration file a couple months beforethe patch application to support some new disks, when the patchoverwrites the file, those disks will be unusable after reboot. Ifthose disks contain the/ or/usr filesystem, the system won't boot at all.

Anothercause of post-patch failures is software incompatibilities. Anyapplication that depends on the software that you are patching must beable to work with the new patched version of that software. The mostcommon manifestation of this problem occurs when upgrading sharedlibraries, which contain code that is shared among many differentapplications to save disk space and memory (see Chapter 11for more information on shared libraries). When a shared library isupgraded, the upgrade affects every application that depends on thelibrary, and applications can fail because of code incompatibilities ornew bugs introduced with the new library.

Backupsand prepatch research are your best defenses against these kinds ofpatching problems. In the case of the Solaris patch 108984-07 you readabout earlier in this chapter, the README shipped along with the patchgoes into great detail about the contents of the patch, as follows:

Code View:Scroll/Show All
Patch-ID# 108984-07
Keywords: QLC ISP2200 buffer PCI FiberChannel FCode CPR PM
Synopsis: SunOS 5.8: qlc patch
Date: 21/Jan/01

Solaris Release: 8

SunOS release: 5.8

Unbundled Product:

Unbundled Release:

Topic: SunOS 5.8: qlc patch

NOTE: Refer to Special Install Instructions section for
IMPORTANT specific information on this patch.

BugId's fixed with this patch: 4264323 4278254 4300470 4300943 4300953 4302087 4
304897 4319582 4324126 4324180 4324192 4324478 4326893 4327991 4328447 4330730 4
334838 4335949 4336664 4336665 4336667 4337688 4344845 4353138 4353797 4353806 4
353815 4353831 4353855 4355029 4357943 4360096 4360591 4360623 4363212 4364558 4
366910 4367402 4368073 4369500 4375320 4377554 4377565 4380799

Changes incorporated in this version: 4380799

Relevant Architectures: sparc

Patches accumulated and obsoleted by this patch: 110190-01

Patches which conflict with this patch:

Patches required with this patch:

Obsoleted by:

Files included with this patch:

/etc/driver_aliases
/etc/driver_classes
/etc/name_to_major
/kernel/drv/qlc
/kernel/drv/qlc.conf
/kernel/drv/sparcv9/qlc
[remainder of file truncated]



This README text clearly explains what the patch does and the files it replaces (including/kernel/drv/qlc.conf).You must make sure that your prepatch backup includes all of the filesthat will be changed by the patch application. After the patchingapplication is complete, you can replace the rewritten config file withthe original.

Use Change Management to Stay Familiar with Your Systems

In addition to reading all of the patch README text, it's even more important to stay familiar with your systems so you know when a problem might occur. Keep a record of what processes and software are running on each server as well as any configuration changes you have made. You can use the change management procedures in Chapter 14, “Internal Communication,” to help keep records of what has changed on your system; you can then reference those changes before applying patches, in order to locate potential conflicts.


Bug Trading

Few thingsin life are more satisfying to a system administrator than finding apatch for a bug in his or her system. The administrator expects toapply the patch and watch the bugs scurry away, and usually that's whathappens. But remember that bugs are fixed by changing source code, andchanging source code can introduce even more bugs—a situation known as bug trading.

Whenyou encounter a bug trade, it is your responsibility to decide whetherit is appropriate to keep the patch or to back it out. You can use therelative severity of each bug to make this decision; if the new bug isless severe than the original bug and doesn't cause as many problems onyour system, it may be worth the trouble to keep the patch installedand work around the new bug. However, if the new bug is more severethan the original bug, the patch will end up causing more problems thanit solves.

Forexample, perhaps a patch fixed a performance problem with your system'sdisk drivers, but now your system is panicking and rebooting itselfrandomly every couple of hours. That kind of bug is one you won't wantto live with, and you will want to back out the patch as soon as possible.

Allow “Burn-In” Time Before Applying Patches

When a new patch is announced, do not immediately apply it to all of your systems. If it is not urgent (for example, your systems are not spontaneously rebooting themselves), wait a few days and allow the patch time to “burn in” in the system administration community. After a few days have passed, check newsgroups, mailing lists, the vendor's Web site, and of course all README files for news of potential problems encountered with the new patch. This will save you the headache of discovering these problems on your own.


Backing Out Patches

If you decide thatthe patch is more trouble than it's worth, you can use the patchingmechanism's back-out options to recover the original files that thepatch replaced. In Solaris 8, this happens by default, and you can backout a patch with thepatchrm command.

Real-World Example: A Bug for a Bug

Students in a university lab were having problems with fonts on the Solaris 2.6 workstations. All 12-point Times Roman characters on the console were appearing as horizontal lines, making text in that font unreadable. Administrators found a new patch on Sun's Web site, released just days before, that addressed the problem and they promptly applied the patch. A few days later, administrators noticed unusually high CPU load on each of the workstations that were patched. The fbconsole process, which controls the command-line console on Sun Solaris workstations, was running multiple times and consuming massive amounts of CPU time, making the workstations run unbearably slow. Upon reading several newsgroups and consulting with Sun support, the administrators discovered that this problem was a known bug with the current patch that was discovered after the patch had been released. Due to the severity of this new bug, the administrators had to back out the patch and live with the font bug for several more weeks before the next patch came out.


Hardware Upgrades

One of the joysof system administration is the chance to work with new andbleeding-edge technology year after year. But if you want your systemto take advantage of emerging technologies, you must upgrade yoursystems as necessary to meet the demands of those technologies. Somesystem upgrades are made for performance or compatibility reasons, andsome phase out old, soon-to-be-unsupported hardware. An upgrade doesnot have to involve an entire machine, either; CPUs, memory, diskdrives, and expansion cards can all be upgraded individually.

Assystem administrator, you want all upgrades on your Unix system to gosmoothly and efficiently. A smooth hardware upgrade can make everyonein the IT department look like miracle workers, but a poorly executedupgrade can make the department look like a group of bumbling fools. Asuccessful hardware upgrade requires careful upfront planning, toensure the following:

  • The upgrades will be compatible with the rest of the system hardware and software.

  • The new hardware has adequate capacity to meet system performance standards.

  • All necessary procedures are in place for transitioning users to the new hardware.

Ensuring Hardware Compatibility

All criticalsoftware should be compatible with the new hardware; after all, thefastest server in the world isn't of much use if it can't run yourapplications. Ensuring compatibility between new hardware and existingsystem software and hardware is rarely a problem if you stay within acertain product line. For example, Sun guarantees that its latestUltraSparc-iii processor servers will be able to run all existingsoftware compiled on existing Sun hardware. Ensuring backwardcompatibility is in the vendors' best interests, as it keeps currentcustomers happy while courting new ones with their new technology.

Compatibilitywith existing hardware can be a thorn in a system administrator's sideas well. For example, if you're upgrading a server connected to a diskarray using ultrawide SCSI, make sure the new server supports and comeswith an ultrawide SCSI bus, software, and appropriate ports and cabling.

Sun's Switch from SBus to PCI

In the late 1990s, Sun Microsystems began replacing the relatively expensive and nonstandard SBus technology with the less expensive and more standard PCI bus in its workgroup and midrange systems. Organizations that upgraded to this new hardware soon discovered that their old SBus cards for SCSI, Ethernet, video, and other technologies no longer worked in their new systems. In the long run, this was a win for Sun, but it is a good example of how a vendor can cause incompatibilities by switching technologies.


Ifyou've been a system administrator for any length of time, you probablyhave experienced situations similar to the aforementioned examples, andyou can expect to run across other hardware compatibility issues in thefuture. In general, you should take the following precautions whenupgrading hardware to ensure compatibility:

  • Verify that your new hardware is certified for use by your operating system and application vendors. Many vendors post product compatibility matrices on their support Web sites for this very purpose.

  • Ensure that all peripherals attached to hardware you are upgrading are compatible with the new hardware. This includes installing the correct expansion cards (such as a fiber channel card for an external fiber channel disk array) and the appropriate cabling.

  • Operate your new hardware in a test environment before deploying it into production, using the testing methodologies from Chapter 4, “Testing Your Systems.” If you have a hefty, well-funded budget, consider adding the hardware to your sandbox environment for future tests.

  • When in doubt, ask your vendor to verify compatibility for you. All vendors and most authorized resellers should have engineers on staff to help you piece together complex solutions.

Real-World Example: The Wrong SCSI Cables

A system administrator assigned to a backup project assumed that the cables that came standard with a new tape library would be compatible with both the LVD (low voltage differential) SCSI ports on the tape drives and the LVD SCSI card on the backup server. After the parts arrived, the administrator found that the adapters on the cables matched the ports on the tape drives, but they were too large for the ports on the SCSI card, which were smaller due to space limitations. The administrator had to order additional cables, delaying the backup project by a few days. Although the vendor should have recognized this problem when putting together the order, it is ultimately the system administrator who is at fault; he should have thoroughly researched not only the individual hardware components, but the cables that connected them as well.


Ensuring Hardware Has Adequate Capacity

When you upgradehardware, the new hardware should have the capacity to perform all ofthe tasks running on the existing hardware. That means it should haveenough CPU, memory, disk space, and I/O and network bandwidth to do thejob. Usually, upgraded hardware provides this capacity by its verynature—if a hardware upgrade doesn't improve the system's capacity orperformance in some way, shape, or form, it isn't much of an upgrade.

Moore's Law Influences the Price of Upgrades

Price also plays a role here. Moore's law (proposed by Gordon Moore, cofounder of Intel Corporation) predicts that processor speeds double every 18 months, and while some would argue that we are approaching the limits of that law using current technologies, it still holds true through the time of this writing. Combining Moore's law with ever-falling memory and disk prices (you can now buy a DIMM for pennies per megabyte), the price-to-performance ratio is constantly decreasing in the consumer's favor. This means that for the same amount of money you spent on a server this year, you are likely to get a much more powerful server for the same amount of money next year.


Not all upgradesincrease capacity by adding higher-capacity hardware. Some upgradesdivide the existing system into smaller, lower-capacity chunks toincrease parallelism and reduce resource contention. For example, alarge multi-CPU database server hosting 24 Oracle databases, andreaching its capacity limits, might be replaced with several smallerservers; each of these smaller servers might offer better performancehosting three or four databases than the old server was capable ofwhile hosting all 24 databases. This configuration also eliminates asingle point of failure; if the original single database server failed,all 24 databases would become unavailable. In the new scenario, if oneserver failed, only three or four databases would be affected.

Real-World Example: Spreading the Wealth

A large ISP procured a Sun Enterprise 6000 (E6000) server to host its customer and billing databases in early 1999. After several upgrades, the server had 10GB of RAM, 16 processors, and over 250GB of local disk space. By November 1999, that same server was running over 20 instances of Oracle, and memory and CPU usage skyrocketed with each added instance. Because there was only one server, the E6000 also represented a massive single point of failure; if the server went down, all customer billing and registration came to a grinding halt. Upgrading the CPU or memory on the server was no longer an option. Instead, the IT staff decided to split the workload onto six separate Enterprise 420R servers, each with four 450MHz processors and 4GB of RAM, with the data stored on a storage area network (SAN). This configuration immediately increased the performance of the billing and registration services, both explicitly (more total RAM and CPU), and implicitly (fewer databases per CPU, and the physical separation of databases reduced resource contention). Increasing capacity was as easy as installing an additional E420R or adding space to the SAN.


Following are some important points to take into account when planning for upgrade capacity. You can read more about each of these points in Chapter 11.

  • Ensure that you have enough disks and controllers to accommodate RAID configurations that exist on the old hardware.

  • Assuming your processing needs have not decreased, upgrade one multiprocessor system with another multiprocessor system. If you upgrade to a single-processor machine, you may incur a severe performance hit even if the new processor is faster than both of the old processors; a faster processor cannot replace the concurrency of multiple processors.

  • Ensure that the total network capacity of the new system meets or exceeds that of the old system. Take into account new technologies—such as upgrading 100Mbit Ethernet to Gigabit Ethernet—as well as tricks such as interface trunking.

  • Verify the maximum capacity of each subsystem on your new systems; these limitations may affect the systems you choose for your upgrade. For example, a Sun Enterprise 220R has a maximum memory capacity of 2GB. If your application required 4GB of memory, you would have to purchase an Enterprise 420R, which supports up to 4GB of memory.

  • If you have decided to replace a single system with multiple systems to better distribute the load of your applications, ensure that the total processor capacity, memory, and network bandwidth exceed that which is available on the original system. In fact, many organizations choose to purchase systems identical to the original but of higher capacity, because they already know the performance limitations of their applications on that server configuration.

Planning a Smooth Transition to Upgraded Hardware

Implementinga smooth transition between the old hardware and the new hardwareshould be included in the goals of any upgrade. A successful upgrade isone in which end users don't notice the change, let alone experienceperceived problems with the new hardware. The last things anyadministrator wants to worry about during an upgrade are user problemsand server uptime.

Some of the more effective ways to achieve this smooth transition include redundancy and hot swapping.

Incorporating Redundancy into the Transition

Redundancy—keepingthe system functioning with redundant hardware while new hardware isbeing installed—is the most straightforward way to smooth anytransition to upgraded hardware. You can use redundancy, for example,to upgrade one or more servers by keeping other servers in productionservicing user requests. Load-balancing hardware can provideredundancy, too. If you use load-balancing hardware while upgrading twoWeb servers, you can upgrade one server at a time without aninterruption in service. The load-balancing hardware sends all requeststo whichever server is running at the time.

Redundancyworks on the other end of the cable too. By outfitting a server withtwo physical network connections to two separate switches, you canupgrade those switches with absolutely no downtime. When one switch isdown, that particular link is no longer used, and all traffic isshuffled off to the other link. As long as there is one switch up andrunning at all times, there will be no outage.

Techniques for implementing redundancy are discussed in detail throughout Chapter 10.

Hot Swapping

Hot swapping allowsyou to replace various components of a machine while the system isstill running, eliminating downtime altogether. Hot-swappable parts canbe found on most enterprise server and storage hardware today. Diskdrives and power supplies are the most common hot-swappable parts,though a few platforms support hot-swappable CPU modules and even PCIcards.

Althoughnot very useful for entire system upgrades, hot swapping is very usefulfor upgrading components. For example, a disk failure in a RAID-5 arraywithout hot-swappable disks would require several minutes of downtimein order to replace the failing disk. With hot-swappable disks,however, you can just pull the failed disk out of its slot and insert anew disk without causing any damage to the system or the disks.

Obtaining Hot-Swappable Hardware

Hot swapping is a technology that must be supported by your hardware. Consult your system's documentation or ask your vendor if you are unsure whether certain components of your system are hot-swappable. In general, disks are the most common hot-swappable part, while CPUs, memory, and expansion cards are usually not hot-swappable. Peripherals on USB (universal serial bus)—including disks—are almost always hot swappable; USB peripherals are becoming very popular on Linux systems.


Operating System Upgrades

Unlike patches,operating system upgrades change the entire operating system andusually require an involved installation process. Even though theactual OS upgrade installation process can go quite smoothly, manyproblems can develop afterward. The following sections document some“best practices” for upgrading an operating system and investigate afew of the more common problems that can occur during the process.

Deciding Whether to Upgrade in Place or Install from Scratch

The actual proceduresfor upgrading an operating system are very specific to each individualoperating system. However, you will usually use one of the followinggeneral methods for upgrading:

  • Upgrade the operating system in place.

  • Install the new operating system from scratch.

Anin-place operating system upgrade replaces critical system files likethe kernel, libraries, and standard Unix utilities with newer versions.It usually leaves your data alone, including home directories andconfiguration files. These advantages would seem to make an “in place”upgrade the most logical choice for an upgrade procedure, but this typeof upgrade offers some significant disadvantages as well. More oftenthan not, an in-place upgrade replaces a configuration you didn't meanit to replace or changes a shared library causing several of yourapplications to fail.

Youoften find that several small problems like these occur after anin-place upgrade, making your system's behavior unpredictable or“quirky.” Additionally, you may find that various “relics” of the oldoperating system remain, especially in old configuration files that mayno longer be valid. In-place upgrades don't give you the opportunity torepartition your disks to accommodate the additional potential storageneeds of the new operating system; this problem is discussed in moredetail in “Guaranteeing Adequate Disk Space for the Upgrade,” later in this chapter.

Thealternative to an in-place upgrade is to install the new operatingsystem from scratch. This is the only method that guarantees a freshinstallation of the operating system with no relics or conflicts frompast installations. It also allows you to repartition your disks toreflect the storage needs of the new operating system (if they havechanged at all). These are big advantages, but reloading from scratchalso comes with the following disadvantages:

  • All of your old configurations will be lost.

  • Many locally installed applications will have to be reinstalled.

  • It will take much longer to perform the upgrade than it would with an in-place upgrade.

You Can Preserve Data from Some File Systems

Most operating system upgrade software will recognize nonsystem file systems and ask if you want them preserved. This is especially useful for file systems dedicated to home directories and software repositories. Even if you choose to preserve some file systems, be sure to make full backups of all of your file systems, just in case something goes wrong and you lose the data on your disk. Installing an operating system is a complex process, and disasters like this happen to everyone at some point.


Choosingbetween an in-place upgrade and reinstalling from scratch is animportant decision. In general, if you have most of your data on ashared file system like an NFS file server, it makes sense to reinstallfrom scratch, as your data won't be affected. If you have installed alot of critical software on a local disk, though, an in-place upgrade is the most logical decision.

Overcoming Shared Library Incompatibility

Many libraries in/usr/lib,/lib, and various other vendor-specific paths are critical to an application's functionality. When OS libraries likelibc andglibc(which contain most of the standard C function code) change,applications running on the operating system can experience massiveproblems, including segmentation faults, core dumps, data corruption,and all other sorts of nasty things.

Sharedlibraries almost always change with each release of an operatingsystem. Although backward compatibility is a priority of all vendors,it can't be guaranteed. Therefore, always test applications on afreshly upgraded test server before performing the upgrade on systemsthat matter.

Typically,shared library incompatibilities need to be addressed in theapplications themselves. Application vendors will rewrite their code toaccount for each new operating system, if necessary. If you don't wantto wait for the vendor, however, you can use old shared libraries onthe upgraded system. As long as the applications know where to find theold shared libraries, they won't know the difference.

A Vendor-Supplied Shared Library File

One example of a vendor-supplied solution for retaining old shared libraries is Oracle 8i on Linux. Originally built on Red Hat 6, Oracle 8i requires GLIBC 2.1.3. Unfortunately, Red Hat versions 7 and later come with GLIBC 2.2, which causes errors when Oracle runs. The solution is to install the compat-glibc RPM (a file available from Red Hat's Web site that contains the old shared libraries) on the server. As its name suggests, the compat-glibc RPM exists solely for the purpose of compatibility.


To use old shared librarieswith new OS upgrades, you must find the old files and put them wherethe new version of the OS can find them. If the application calleddoesitall works withlibc.so on your old system, for example, but not thelibc.so on the upgraded system, save the old one and put it in a special directory, say,/usr/lib/compat. The environment variableLD_LIBRARY_PATHtells the system where to search for shared libraries beyond thestandard system paths and those compiled into the application. You canwrite a script to replace the originaldoesitall that forces the application to use thelibc.so in/usr/lib/compat, as follows:

#!/bin/sh

# look in /usr/lib/compat first for shared libraries
LD_LIBRARY_PATH=/usr/lib/compat
export LD_LIBRARY_PATH
# run the original program with command line arguments
/usr/local/bin/doesitall.orig $*

Setting this variable in a script can be a cumbersome task. On GLIBC systems like Linux and OpenBSD, you can add paths to/etc/ld.so.conf and runldconfig to accomplish the same thing. In fact, this is the preferred method for adding library search paths on these systems.

Avoiding Problems with Overwritten Configuration Files

As with patches, operating system upgrades can overwrite your valuable configuration files if you're not careful. Well-known files like/etc/passwd,/etc/group, and/etc/shadoware usually left alone by Unix operating system upgrades, as theycontain important account information. But other files may not be sosafe, depending on the upgrade process. For example, in the upgradejust mentioned, you must be sure to save a backup copy of/etc/resolv.conf,because that file contains name service configuration. You don't wantto lose that configuration if the file is overwritten with a defaultfile during the upgrade. (See the earlier advice on making a fullbackup of any system before upgrading.)

Someoperating systems have failsafe measures in place to deal with theproblem of overwritten config files. Red Hat's upgrade process, forexample, uses RPM to upgrade its packages. RPM packages can identifycertain files as configuration files to make sure they're notoverwritten during an upgrade. This protection is not requiredbehavior, however, so it is never safe to assume that these files willbe preserved during an upgrade. To ensure the preservation of yourconfiguration files, back them up to tape or just copy them to anotherdirectory for safekeeping. You can obtain a list of the filesdesignated as configuration files by an RPM package with the followingcommand:

rpm –qcp package-file

Thepackage-file argument is the name of the RPM file, which should end in.rpm.

Guaranteeing Adequate Disk Space for the Upgrade

Upgrades ofteninclude significant new functionality, and that functionality takes upspace on disk. Considering that your previous partitioning scheme wasprobably optimized for the previous version of the operating system,the new version may require more space than is currently available. Thefollowing disk usage on a Solaris 8 server, for example, would not beoptimal for an upgrade:

bash$ df -k
Filesystem kbytes used avail capacity Mounted on
/dev/dsk/c0t0d0s0 191611 170072 2378 99% /
/dev/dsk/c0t0d0s6 4032142 1123219 2868602 29% /usr
/proc 0 0 0 0% /proc
fd 0 0 0 0% /dev/fd
mnttab 0 0 0 0% /etc/mnttab
/dev/dsk/c0t0d0s3 962571 876000 28817 97% /var
swap 1285320 80 1285240 1% /var/run
swap 262144 13048 249096 5% /tmp
/dev/dsk/c0t0d0s4 1987399 307341 1620437 16% /opt

Anupgrade to this particular server would be certain to fail, as thereare only 2MB of free space left on the root partition. To accommodateadditional disk space requirements for OS upgrades, be sure to leavesignificant free space on the partitions/ (root),/usr, and/var.

Because critical system state, parts of the kernel, and system utilities are usually stored in/, it's important to always have some free space there. The partition/usrhouses even more system utilities and libraries, and is usually therepository for third-party applications and data. Most administratorsallocate the bulk of space on a server to the /usr partition. Thepartition/var is mostly used forlogging and temporary file storage, so you should make some spaceavailable on that partition for the upgrade process. The following disk usage is appropriate for the example Solaris 8 server upgrade:

bash$ df -k
Filesystem kbytes used avail capacity Mounted on
/dev/dsk/c0t0d0s0 191611 60620 111830 36% /
/dev/dsk/c0t0d0s6 4032142 869552 3122269 22% /usr
/proc 0 0 0 0% /proc
fd 0 0 0 0% /dev/fd
mnttab 0 0 0 0% /etc/mnttab
/dev/dsk/c0t0d0s4 482455 206427 227783 48% /var
swap 1051896 88 1051808 1% /var/run
swap 262144 54648 207496 21% /tmp
/dev/dsk/c0t0d0s5 962571 26532 878285 3% /opt

Planning for future disk capacity needs is a complex process that you learn about in more detail in Chapter 11.However, when all else fails and you cannot reclaim any more space in afile system, you may need to move some of the data to another filesystem that has extra space available. You can do this in one of thefollowing ways:

  • Create a new file system on an unused partition to store the data. Creating a new file system to temporarily store your data is the cleanest method to reclaim space because it keeps the data from polluting other file systems, and you can mount the file system directly onto the directory whose data it houses. However, this method requires the use of an extra partition, which you may not have available or may have wanted to use for other purposes.

  • Utilize an existing file system and symbolic links. Most Unix systems have file systems that are close to capacity on the same disk as those that are woefully underutilized. Instead of creating a new file system, you can move an entire directory to another file system with more available space and create a symbolic link to point to the new location. For example, if you move the directory /var/bigfiles to /usr/data/bigfiles, you would create a symbolic link at /var/bigfiles that points to /usr/data/bigfiles with the command

    ln –s /usr/data/bigfiles /var/bigfiles

Itis important to revisit the disk space situation after the upgrade iscomplete; many upgrade procedures ask for more space than isnecessary—just to be safe—and some only need the extra space forstaging the installations. If, after the migration, you find that theoriginal file system has enough space to store the data you migrated,move the data back to its original location as soon as possible.Storing the files in their original file system avoids cluttering otherfile systems or frees up partitions that you used if you created a new file system.

Ensuring Adequate Driver Support on the Upgraded Operating System

If you'veever hooked up a disk array, tape drive, or printer to your system,chances are that you've used or installed a device driver. Drivers arealso used to enhance the kernel with features like new file systems andunsupported network protocols.

Becausedrivers are the link between the kernel and devices, they usually showa great affinity for the kernel under which they were developed. Anetwork card driver built for the Linux 2.2 kernel will probably notwork on Linux 2.4 without recompilation and possibly some code changes.

Brokendrivers can bring an upgrade to a grinding halt. If the operatingsystem can't talk to the hardware it's running on, there's reallynothing a system administrator can do to solve the problem, short ofwriting his or her own drivers.

Luckily,most developers and hardware vendors have early access to new releasesof operating system, and can have new drivers ready in time tosupplement the general release of the new operating system version. Ifeverything works out, these new drivers prevent the driverincompatibility problem from occurring. If you have to build your owndrivers, make sure you verify the vendor's release schedule, so you have time to get the drivers ready before your upgrade.

Firmware Upgrades

Many novice computerusers (and beginning system administrators) are surprised to learn thathardware such as network cards, disk drives, and motherboards come withtheir own software, and they require that software in order to work atall. Firmwareis the term used for software that has been written into ROM (read only memory) and that accompanies a specific piece of hardware.

The BIOS Is Firmware

One type of firmware that most people are familiar with is the BIOS (basic input/output system) on their PCs. The BIOS is software that contains all the code necessary to run the PC's keyboard, disk drives, monitor, drivers, and other critical functions. The BIOS comes on a ROM chip. On most modern motherboards, BIOS resides in flash ROM—a ROM chip that can be reprogrammed, or “flashed,” with a new version of the BIOS. Flash ROM retains its settings through shutdowns and reboots, so it's perfect for running an application like the BIOS.


Firmware eliminates theneed for repeated hardware upgrades by allowing critical hardwarefunctionality to be coded in software. When new functionality isrequired in existing hardware, or when a bug needs fixing, you canupdate the hardware's firmware to achieve those goals.

Upgrading flash ROMisn't difficult, but you should follow a few important guidelines.

  • Most firmware is upgraded with a program on the host server; you usually need to run this program as root for raw access to the device.

  • Some ROM chips come with a write lock that prevents unintentional overwriting. This lock is usually a physical switch, similar to a dip switch or a jumper on a circuit board. You have to unset the lock in order to reprogram the ROM.

  • Make sure that hardware has a constant power supply during the firmware upgrade, and then leave the process alone! Flash upgrades are sensitive to interruptions. Firmware contains code critical to the functionality of the hardware; if this code is corrupted in any way, the hardware could fail irreversibly. Just let the upgrade take its course and be on your way, or you might be calling your vendor about a machine that won't boot.

Plan Successful Firmware Updates

When planning the firmware update process, don't forget to take into account the time required to unlock and relock the ROM chip. In each case, you need to physically set a dipswitch or jumper. You might have to remove the case or other system components to access the switch. This process could add five minutes for each machine to your upgrade—additions that can add up to hours for large upgrades. Also, don't forget to reset the switches; once the ROM chip lock is unset, the ROM can be reprogrammed. When you have finished reprogramming the ROM for a firmware upgrade, be sure to reset the lock to safeguard the chip's data.


Real-World Example: Forgetting Minor Details

While planning a firmware upgrade for 16 Sun Enterprise 1 login servers, a system administrator estimated that each machine would take 5 minutes to upgrade. Each machine would be upgraded one at a time, so he planned for a 90-minute outage. However, in calculating the estimated upgrade time for each machine, the administrator forgot about some “minor” details. Each machine had to be shut down, removed from its cabinet, and the write-enable jumper had to be moved into place. The machine was then put back into the cabinet where its firmware was upgraded. Afterwards, the machine was shut down again, removed from its cabinet, opened up to disable the write-enable jumper, and then put back together and reinserted into the cabinet. That amount of work took much more than 5 minutes. In actuality, each machine took between 10 and 15 minutes to upgrade, bringing the total downtime to just over 3 hours—more than double the original 90-minute estimate. End users were frustrated with the extended downtime, and the experience cast an unflattering light on the administrator's planning and implementation abilities. Don't forget the logistics involved with system administration. When planning a major upgrade, always remember to do a dry run on a single machine to get an accurate idea of true downtime.


Decommissioning Services

All services becomeobsolete sooner or later. Whether the system administrator phases outan application in favor of different software or shuts down a specificsystem or service entirely, a smooth decommission should be theadministrator's ultimate goal. Too many system administrators simply“pull the plug” on services and hardware without properly planning forthe transition or the consequences.

Youcan avoid some of the end-user frustration and system downtime that canaccompany a poorly executed service decommission by following a fewsimple guidelines. By identifying the users of thesoon-to-be-decommissioned service and then notifying them of theupcoming transition, you have an opportunity to solicit theircooperation (or at least overcome some reluctance to change) in thetransition process. And by planning for a smooth transition with ampleprovisions for reassigning services or training users in replacementtechnologies, you help make the transition successful, both for the ITdepartment and the organization it supports.

Identifying Service Users

Do you reallyknow who is using your services? The chances are great that someone isusing each server and service that your company currently has in itssystem. And even the most obscure or specialized service can have a fewhighly devoted end users.

Nobodylikes change; people get used to the services they use, and they wantto hang onto them as long as possible. If you don't want a servicedecommission to be followed by a loud, prolonged howl from frustratedand disappointed end users, you need to make sure you know exactly whois using the service (and how they're using it). Step one in thedecommissioning process is to find out who is using the service you'reabout to decommission. There are several ways to accomplish this task,as follows:

  • Read the documentation for the service. Too often a system administrator will run a system hosting applications he or she knows little or nothing about. After you understand the purpose of a service, you will have a much better idea of who would be using the service.

  • Post a notice in the message of the day announcing the decommission and provide contact information so users can express their concerns.

  • In a corporate environment, send an email to all employees announcing the service decommission. Ask users of the service to respond with a short message saying if and how they use the service and any concerns they have.

  • If your service requires accounts or subscriptions, look in your account database to discover the users who could possibly be using your services. If the service is a simple Unix shell service, this database would be the password file /etc/passwd.

  • Analyze the usage of the service using logs or network traffic sniffers. This is the only foolproof way to find out who is using your services. You can read more about network sniffers in Chapter 13, “Implementing System Security.”

Youalso need to gather some important information about how people areusing the service, so that you understand all of the possiblerepercussions of the decommission. Ask the following questions:

  • For what are they using the service?

  • Is the service business-critical?

  • Are there supported alternatives available?

  • Should supported alternatives be made available?

Youmay find out that nobody is using the service, or you may find that asurprisingly large number of people are using it. In fact, most systemadministrators discover that many people don't know the individualservices they use and rely on in their day-to-day business, much lesson which servers those services reside. This is especially true amongpeople who have worked at a company for a long time; often, theseindividuals continue to use services and servers they began using ontheir first day on the job!

Afteryou know who is using your services and how they are using thoseservices, you can use that information to begin planning thedecommission. Here are some planning guidelines:

  • Notify the users of the pending decommission.

  • Collect their comments and suggestions about the timing of the decommission and the functionality of the new service. This information will be important for determining when to completely shut down the original service.

  • Allow the users to participate in any test of the new service—the actual users of a service are the best fit to test it.

  • Contact your organization's management and indicate the high-level functions that the service provides. Management will then be able to identify the affected business functions to better assess the impact of the decommission on your business.

Youmight not be able to root out every user who makes use of a servicethat you're planning to decommission. In those cases, you have toproceed with the decommissioning and be prepared to deal with theaftermath; or, as one sysadmin has said, “Pull the plug and see whoscreams.”

Legacy systems can bethe most difficult to remove; over time they can work their way intoevery office and cubicle in the business and become entangled inpeople's daily work routines. Legacy systems may still serve validbusiness purposes, but by definition they are not part of yourcurrently deployed technology infrastructure. A legacy system isusually decommissioned for one of two reasons: It is being replacedwith new technology or its services are being phased out completely.While you may be tempted to pull the plug on these systems, you shouldfirst make sure that your current systems support the functionality ofthe legacy systems you are decommissioning and give users time to learnhow to use the modern incarnation of the service.

Thefate of the services on any legacy system you are decommissioningdetermine the steps you need to take in the decommissioning process; acomplete phasing out of a service may simply require shutting down aserver, but if a service is being replaced with one on a new system,you will need a transitional phase to ease end users into the newservice. The following sections will guide you through this transition.

Real-World Example: An Untimely Death

Narya was the first server at a small ISP, reluctantly purchased in response to the technical staff's need for a server. Because the company wanted to minimize its investment, narya was small and housed all of the company's services: corporate email, billing, the company Web site, and home directories. As the company grew and acquired more staff and more appropriate corporate technology infrastructure, the system administrator migrated services off narya. Eventually, narya became a legacy server, used only to house some old data. Administrators neglected to maintain the server or to monitor its use. Four years after its deployment, narya died. After many attempts to revive the server, including new power supplies, the IT staff gave up, and decommissioned the server. Unfortunately, at least 20 employees, including the CEO, president, board members, and several original clients, had their email routed through narya, a relic of the old days. After the server's decommissioning, email sent to these individuals bounced back to the sender—a situation no one was happy about. Administrators scrambled to recover all of the address information from disks and backups, and a week later, all email was routed to the appropriate corporate or client mail server. Finally, the disaster was over, but the administrators learned to always audit the services on all of their servers; you never know who is depending on that dusty little server in the corner!


Notifying the Users

The people whoare most affected by decommissioning a service are its users, so it'simportant to keep them involved in the whole process. Swapping out aservice without informing its users is considered very bad form—both byusers and your management—even if the service is being replaced with asimilar service. The abrupt loss of a service angers users who dependon the service and may interrupt critical business functions in yourorganization. No matter how much confidence you have in the newservice, if something goes wrong and nobody knew about your plans aheadof time, you'll experience the wrath of both your users and yourmanagement; if you inform them ahead of time, they will at least beprepared for any potential problems.

Giveusers plenty of notice and get approval from management beforedecommissioning anything. Your management should also be able toprovide you with an acceptable lead time between the announcement andthe actual decommission; the lead time depends on many factors,including the number of end users and the system's importance to yourorganization's business. A simple notice like the following is all youneed to effectively announce an upcoming service decommissioning:

Code View:Scroll/Show All
From: Mike Admin
To: All Staff
Subject: Telnet Access
Effective Monday January 28 at 7 AM, access to corporate servers via telnet
will no longer be permitted. Please use the new SSH client available from the
IT home page. Detailed instructions are included in the attached document.
Please contact the IT department at helpdesk@example.com if you experience any
problems.

Thank you,

The IT Staff



Arrange for Backups of the Original Service's Data

Before beginning any decommission, ensure that you have (or plan to take) full backups of the original service's configurations and data. Some data is bound to be lost or configurations forgotten on the new system, and these will not be available after the old service is decommissioned and its systems have been repurposed.


Easing into Any Transition

Most users donot react well to abrupt changes; they need a transitional period inwhich they can adjust to the usage and quirks of any new services. Thistransitional period can range from a few hours to a few years,depending on the scope of the service you are offering and the urgencyof the change.

Themost publicized decommission in progress as of this writing is thetransition from traditional analog television broadcasts to digitalbroadcasts. Currently, the process is in its infant stages, but the FCC(Federal Communications Commission) has declared that all NTSC (National Television Standards Committee) broadcasts, which are the analog standard in 2002, be phased out completely in favor of digital broadcasts by 2006.

The transition from analogto digital technology means major changes not only for televisionnetworks, but also for consumers; every new television manufactured andsold in the country must be able to receive digital broadcasts by 2006.Considering the millions of television sets consumers buy every year inthis country, the transition represents a major task. Digital TVbroadcasts began in 1998, so the FCC has implemented an eight-yeartransitional period for consumers and television networks to adopt thenew technology. That's a long time, but it gives everyone a chance tocatch up and adjust to the new technology in time for the officialdecommission.

Thesame kind of careful planning and transition phase is essential for asuccessful IT service decommissioning project. Corporate users needinformation, assistance, and ample time to adjust to the adoption ofnew technologies or the loss of old technologies to which they'vebecome accustomed. In general, there are four major steps to followduring a transition:

  • Notify end users and management of the impending decommission. Include the services that are being decommissioned, the time of the decommission, and the length of any transition period in which the old services can still be accessed.

  • Install and configure the replacement systems ahead of time and test their functionality and stability.

  • Allow end users to test the new system before the decommission takes place.

  • Provide ample transition time for training and system tweaking before the old system is turned off for good.

Considerthe example of an IT department in a company whose employeestraditionally have used Telnet to connect to the corporate networkfrom home. The company's CTO has decided that using Telnet is tooinsecure and limited. The CTO declares that a VPN (virtual privatenetwork) solution should be in place within three months, at which timeTelnet access is to be disabled.

Twomonths is ample time for system administrators to have a VPN solutionin place. But they don't rush the system into use immediately.Decommissioning Telnet abruptly won't give the other staff much time toadjust. Instead, the IT department sends out informational noticesconcerning the eventual Telnet decommissioning, beginning early in thethree-month transition period the CTO has declared. Then, after the newVPN system is ready, the IT department uses the final month of thetransition period to slowly phase out Telnet.

Duringthe month-long end-user transition, IT issues VPN clients to staffaccompanied by detailed instructions. IT also arranges VPN softwaretraining for desktop support staff, so they can provide support to theuninitiated new users. During this one-month transition, Telnet accessis left open while everyone gets used to the VPN. This “dual service”phase also leaves a proven and viable remote access solution in placeduring the transition—if users begin to experience problems with theVPN, they can fall back to Telnet while administrators fix it. Afterthe month is up, an announcement can be made that Telnet is finallybeing decommissioned.

Create a FAQ for the Transition

During the transitional phase, you may discover problems and questions that are common to most users. Create a FAQ that will provide answers and solutions to the end users and post it somewhere easily accessible, such as a company intranet, or distribute it via email.


Decommission, Don't Destroy

When mostpeople finish reading a good book, they might put it back on abookshelf or maybe even pack it away in a box, but rarely do they tossthe book in the trash. You can think of decommissioned services asbeing similar to those used books: Even though you're no longer using aservice, you don't rush to destroy its data or break its server downfor spare parts, at least not immediately. What happens if somethinggoes wrong with your new deployment a day, a week, or even a month downthe road? What if you need data from the old server that isn't in yourcurrent backup library?

It'sa good idea to keep the infrastructure from the old service up andrunning, or at least available to administrators, for a period of timeafter its decommission. With the old infrastructure on reserve, you canstill retrieve data from or even revert back to the original service ifsomething goes horribly wrong with the new service. When you've hadample time to determine that the old service is completely obsolete andsupplanted by the new service, you can repurpose the old infrastructure.

Summary

Patches,upgrades, and decommissions are not very exciting parts of a systemadministrator's daily routine. Too often, system administrators ignorethese basic system maintenance functions or perform them casually. Asthis chapter has explained, all changes to a Unix network requirecareful planning, testing, end-user education, implementation, andfollow-up support. To plan and implement a successful system upgrade ordecommission, follow these general rules:

  • Read all vendor documentation.

  • Know who is using the services on your network.

  • Test everything in a sandbox environment first. If you don't have a sandbox, lobby for one using the argument previously discussed.

  • Notify users and management of upcoming changes.

  • Back up your systems completely before doing anything!