[Nemeth10] 2.4. Perl programming

来源:百度文库 编辑:神马文学网 时间:2024/05/02 12:07:24

2.4. Perl programming

Perl, created by Larry Wall, was the first of the truly great scripting languages. It offers vastly more power than bash,and well-written Perl code is quite easy to read. On the other hand,Perl does not impose much stylistic discipline on developers, so Perlcode written without regard for readability can be cryptic. Perl hasbeen accused of being a write-only language.

Here we describe Perl 5,the version that has been standard for the last decade. Perl 6 is amajor revision that’s still in development. See perl6.org for details.

Either Perl or Python (discussed starting on page 66)is a better choice for system administration work than traditionalprogramming languages such as C, C++, C#, and Java. They can do more,in fewer lines of code, with less painful debugging, and without thehassle of compilation.

Language choice usuallycomes down to personal preference or to standards forced upon you by anemployer. Both Perl and Python offer libraries of community-writtenmodules and language extensions. Perl has been around longer, so itsofferings extend further into the long tail of possibilities. Forcommon system administration tasks, however, the support libraries areroughly equivalent.

Perl’s catch phrase is that“there’s more than one way to do it.” So keep in mind that there areother ways of doing most of what you read in this section.

Perl statements are separated by semicolons.[11]Comments start with a hash mark (#) and continue to the end of theline. Blocks of statements are enclosed in curly braces. Here’s asimple “hello, world!” program:

[11] Since semicolons are separators and not terminators, the last one in a block is optional.

#!/usr/bin/perl
print "Hello, world!\n";

As with bash programs, you must either chmod +x the executable file or invoke the Perl interpreter directly.

$ chmod +x helloworld
$ ./helloworld
Hello, world!

Lines in a Perl script are not shell commands; they’re Perl code. Unlike bash,which lets you assemble a series of commands and call it a script, Perldoes not look outside itself unless you tell it to. That said, Perlprovides many of the same conventions as bash, such as the use of back-ticks to capture the output from a command.

Variables and arrays

Perl has three fundamentaldata types: scalars (that is, unitary values such as numbers andstrings), arrays, and hashes. Hashes are also known as associativearrays. The type of a variable is always obvious because it’s builtinto the variable name: scalar variables start with$, array variables start with@, and hash variables start with%.

In Perl, the terms “list” and“array” are often used interchangeably, but it’s perhaps more accurateto say that a list is a series of values and an array is a variablethat can hold such a list. The individual elements of an array arescalars, so like ordinary scalar variables, their names begin with$. Array subscripting begins at zero, and the index of the highest element in array@a is$#a. Add 1 to that to get the array’s size.

The array@ARGV contains the script’s command-line arguments. You can refer to it just like any other array.

The following script demonstrates the use of arrays:

#!/usr/bin/perl

@items = ("socks", "shoes", "shorts");
printf "There are %d articles of clothing.\n", $#items + 1;
print "Put on ${items[2]} first, then ", join(" and ", @items[0,1]), ".\n";

The output:

$ perl clothes
There are 3 articles of clothing.
Put on shorts first, then socks and shoes.

There’s a lot to see in justthese few lines. At the risk of blurring our laser-like focus, weinclude several common idioms in each of our Perl examples. We explainthe tricky parts in the text following each example. If you read theexamples carefully (don’t be a wimp, they’re short!), you’ll have aworking knowledge of the most common Perl forms by the end of thischapter.

Array and string literals

In this example, notice first that(...)creates a literal list. Individual elements of the list are strings,and they’re separated by commas. Once the list has been created, it isassigned to the variable@items.

Perl does not strictly require that all strings be quoted. In this particular case, the initial assignment of@items works just as well without the quotes.

@items = (socks, shoes, shorts);

Perl calls these unquotedstrings “barewords,” and they’re an interpretation of last resort. Ifsomething doesn’t make sense in any other way, Perl tries to interpretit as a string. In a few limited circumstances, this makes sense andkeeps the code clean. However, this is probably not one of those cases.Even if you prefer to quote strings consistently, be prepared to decodeother people’s quoteless code.

The more Perly way to initialize this array is with theqw(quote words) operator. It is in fact a form of string quotation, andlike most quoted entities in Perl, you can choose your own delimiters.The form

@items = qw(socks shoes shorts);

is the most traditional, but it’s a bit misleading since the part after theqw is no longer a list. It is in fact a string to be split at whitespace to form a list. The version

@items = qw[socks shoes shorts];

works, too, and is perhapsa bit truer to the spirit of what’s going on. Note that the commas aregone since their function has been subsumed byqw.

Function calls

Both print and printf accept an arbitrary number of arguments, and the arguments are separated by commas. But then there’s thatjoin(...) thing that looks like some kind of function call; how is it different from print and printf?

In fact, it’s not; print, printf, and joinare all plain-vanilla functions. Perl allows you to omit theparentheses in function calls when this does not cause ambiguity, soboth forms are common. In the print line above, the parenthesized form distinguishes the arguments to join from those that go to print.

We can tell that the expression@items[0,1] must evaluate to some kind of list since it starts with@. This is in fact an “array slice” or subarray, and the0,1subscript lists the indexes of the elements to be included in theslice. Perl accepts a range of values here, too, as in the equivalent expression@items[0..1]. A single numeric subscript would be acceptable here as well:@items[0] is a list containing one scalar, the string “socks”. In this case, it’s equivalent to the literal("socks").

Arrays are automatically expanded in function calls, so in the expression

join(" and ", @items[0,1])

joinreceives three string arguments: “ and ”, “socks”, and “shoes”. Itconcatenates its second and subsequent arguments, inserting a copy ofthe first argument between each pair. The result is “socks and shoes”.

Type conversions in expressions

In the printf line,$#items + 1 evaluates to the number 3. As it happens,$#items is a numeric value, but that’s not why the expression is evaluated arithmetically;"2" + 1 works just as well. The magic is in the+operator, which always implies arithmetic. It converts its arguments tonumbers and produces a numeric result. Similarly, the dot operator (.),which concatenates strings, converts its operands as needed:"2" . (12 ** 2) yields “2144”.

String expansions and disambiguation of variable references

As in bash, double-quoted strings are subject to variable expansion. Also as in bash, you can surround variable names with curly braces to disambiguate them if necessary, as with${items[2]}. (Here, the braces are used only for illustration; they are not needed.) The$ clues you in that the expression is going to evaluate to a scalar.@items is the array, but any individual element is itself a scalar, and the naming conventions reflect this fact.

Hashes

A hash (also known as anassociative array) represents a set of key/value pairs. You can thinkof a hash as an array whose subscripts (keys) are arbitrary scalarvalues; they do not have to be numbers. But in practice, numbers andstrings are the usual keys.

Hash variables have% as their first character (e.g.,%myhash), but as in the case of arrays, individual values are scalar and so begin with a$. Subscripting is indicated with curly braces rather than square brackets, e.g.,$myhash{'ron'}.

Hashes are an importanttool for system administrators. Nearly every script you write will usethem. In the code below, we read in the contents of a file, parse itaccording to the rules for /etc/passwd, and build a hash of the entries called%names_by_uid. The value of each entry in the hash is the username associated with that UID.

#!/usr/bin/perl

while ($_ = <>) {
($name, $pw, $uid, $gid, $gecos, $path, $sh) = split /:/;
$names_by_uid{$uid} = $name;
}
%uids_by_name = reverse %names_by_uid;

print "\$names_by_uid{0} is $names_by_uid{0}\n";
print "\$uids_by_name{'root'} is $uids_by_name{'root'}\n";

As in the previous scriptexample, we’ve packed a couple of new ideas into these lines. Before wego over each of these nuances, here’s the output of the script:

$ perl hashexample /etc/passwd
$names_by_uid{0} is root
$uids_by_name{'root'} is 0

Thewhile ($_ = <>) reads input one line at a time and assigns it to the variable named$_;the value of the entire assignment statement is the value of therighthand side, just as in C. When you reach the end of the input, the<> returns a false value and the loop terminates.

To interpret<>,Perl checks the command line to see if you named any files there. Ifyou did, it opens each file in sequence and runs the file’s contentsthrough the loop. If you didn’t name any files on the command line,Perl takes the input to the loop from standard input.

Within the loop, a series of variables receive the values returned bysplit, a function that chops up its input string by using the regular expressionpassed to it as the field separator. Here, the regex is delimited byslashes; this is just another form of quoting, one that’s specializedfor regular expressions but similar to the interpretation of double quotes. We could just as easily have writtensplit ':' orsplit ":".

The string thatsplit is to divide at colons is never explicitly specified. Whensplit’s second argument is missing, Perl assumes you want to split the value of$_. Clean! Truth be told, even the pattern is optional; the default is to split at whitespace but ignore any leading whitespace.

But wait, there’s more. Even the original assignment of$_, back at the top of the loop, is unnecessary. If you simply say

while (<>) {

Perl automatically stores each line in$_. You can process lines without ever making an explicit reference to the variable in which they’re stored. Using$_ as a default operand is common, and Perl allows it more or less wherever it makes sense.

In the multiple assignment that captures the contents of each passwd field,

     ($name, $pw, $uid, $gid, $gecos, $path, $sh) = split /:/;

the presence of a list on the left hand side creates a “list context” forsplit that tells it to return a list of all fields as its result. If the assignment were to a scalar variable, for example,

$n_fields = split /:/;

split would run in“scalar context” and return only the number of fields that it found.Functions you write can distinguish between scalar and list contexts,too, by using thewantarray function. It returns a true value in list context, a false value in scalar context, and an undefined value in void context.

The line

%uids_by_name = reverse %names_by_uid;

has some hidden depths, too. A hash in list context (here, as an argument to thereverse function) evaluates to a list of the form (key1, value1, key2, value2, ...). Thereversefunction reverses the order of the list, yielding (valueN, keyN, ...,value1, key1). Finally, the assignment to the hash variable%uids_by_name converts this list as if it were (key1, value1, ...), thereby producing a permuted index.

References and autovivification

These are advanced topics,but we’d be remiss if we didn’t at least mention them. Here’s theexecutive summary. Arrays and hashes can only hold scalar values, butyou will often want to store other arrays and hashes within them. Forexample, returning to our previous example of parsing the /etc/passwd file, you might want to store all the fields of each passwd line in a hash indexed by UID.

You can’t store arrays and hashes, but you can store references(that is, pointers) to arrays and hashes, which are themselves scalars.To create a reference to an array or hash, you precede the variablename with a backslash (e.g.,\@array)or use reference-to-array or reference-to-hash literal syntax. Forexample, our passwdparsing loop would become something like this:

while (<>) {
$array_ref = [ split /:/ ];
$passwd_by_uid{$array_ref->[2]} = $array_ref;
}

The square brackets return a reference to an array containing the results of the split. The notation$array_ref->[2] refers to the UID field, the third member of the array referenced by$array_ref.

$array_ref[2] won’t work here because we haven’t defined an@array_ref array;$array_ref and@array_ref are different variables. Furthermore, you won’t receive an error message if you mistakenly use$array_ref[2] here because@array_ref is a perfectly legitimate name for an array; you just haven’t assigned it any values.

This lack of warnings mayseem like a problem, but it’s arguably one of Perl’s nicest features, afeature known as “autovivification.” Because variable names andreferencing syntax always make clear the structure of the data you aretrying to access, you need never create any intermediate datastructures by hand. Simply make an assignment at the lowest possiblelevel, and the intervening structures materialize automatically. Forexample, you can create a hash of references to arrays whose contentsare references to hashes with a single assignment.

Regular expressions in Perl

You use regular expressions in Perl by “binding” strings to regex operations with the=~ operator. For example, the line

if ($text =~ m/ab+c/) {

checks to see whether the string stored in$text matches the regular expressionab+c. To operate on the default string,$_, you can simply omit the variable name and binding operator. In fact, you can omit them, too, since the operation defaults to matching:

if (/ab+c/) {

Substitutions work similarly:

$text =~ s/etc\./and so on/g;     # Substitute text in $text, OR
s/etc\./and so on/g; # Apply to $_

We sneaked in agoption to replace all instances of “etc.” with “and so on”, rather thanjust replacing the first instance. Other common options arei to ignore case,s to make dot (.) match newlines, andm to make the^ and$ tokens match at the beginning and end of individual lines rather than only at the beginning and end of the search text.

A couple of additional points are illustrated in the following script:

#!/usr/bin/perl

$names = "huey dewey louie";
$regex = '(\w+)\s+(\w+)\s+(\w+)';

if ($names =~ m/$regex/) {
print "1st name is $1.\n2nd name is $2.\n3rd name is $3.\n";
$names =~ s/$regex/\2 \1/;
print "New names are \"${names}\".\n";
} else {
print qq{"$names" did not match "$regex".\n};
}

The output:

$ perl testregex
1st name is huey.
2nd name is dewey.
3rd name is louie.
New names are "dewey huey".

This example shows that variables expand in// quoting, so the regular expression need not be a fixed string.qq is another name for the double-quote operator.

After a match or substitution, the contents of the variables$1,$2, and so on correspond to the text matched by the contents of the capturing parentheses in the regular expression. The contents of these variables are also available during the replacement itself, in which context they are referred to as\1,\2, etc.

Input and output

When you open a file for reading or writing, you define a “filehandle” to identify the channel. In the example below,INFILE is the filehandle for /etc/passwd andOUTFILE is the filehandle associated with /tmp/passwd. Thewhile loop condition is, which is similar to the<> we have seen before but specific to a particular filehandle. It reads lines from the filehandleINFILE until the end of file, at which time thewhile loop ends. Each line is placed in the variable$_.

#!/usr/bin/perl

open(INFILE, "open(OUTFILE, ">/tmp/passwd") or die "Couldn't open /tmp/passwd";

while () {
($name, $pw, $uid, $gid, $gecos, $path, $sh) = split /:/;
print OUTFILE "$uid\t$name\n";
}

open returns a true value if the file is successfully opened, short-circuiting (rendering unnecessary) the evaluation of thedie clauses. Perl’sor operator is similar to|| (which Perl also has), but at lower precedence.oris a generally a better choice when you want to emphasize thateverything on the left will be fully evaluated before Perl turns itsattention to the consequences of failure.

Perl’s syntax for specifyinghow you want to use each file (read? write? append?) mirrors that ofthe shell. You can also use “filenames” such as"/bin/df|" to open pipes to and from shell commands.

Control flow

The example below is a Perl version of our earlier bash script that validated its command-line arguments. You might want to refer to the bash version on page 41 for comparison. Note that Perl’sif construct has nothen keyword or terminating word, just a block of statements enclosed in curly braces.

You can also add a postfixif clause (or its negated version,unless) to an individual statement to make that statement’s execution conditional.

#!/usr/bin/perl

sub show_usage {
print shift, "\n" if scalar(@_);
print "Usage: $0 source_dir dest_dir\n";
exit scalar(@_) ? shift : 1;
}
if (@ARGV != 2) {
show_usage;
} else { # There are two arguments
($source_dir, $dest_dir) = @ARGV;
show_usage "Invalid source directory" unless -d $source_dir;
-d $dest_dir or show_usage "Invalid destination directory";
}

Here, the two lines that use Perl’s unary-d operator to validate the directory-ness of$source_dir and$dest_dir are equivalent. The second form (with-dat the start of the line) has the advantage of putting the actualassertion at the beginning of the line, where it’s most noticeable.However, the use ofor to mean “otherwise” is a bit tortured; some readers of the code may find it confusing.

Evaluating an array variable in scalar context (specified by thescalar operator in this example) returns the number of elements in the array. This is 1 more than the value of$#array; as always in Perl, there’s more than one way to do it.

Perl functions receive their arguments in the array named@_. It’s common practice to access them with theshift operator, which removes the first element of the argument array and returns its value.

This version of theshow_usagefunction accepts an optional error message to be printed. If youprovide an error message, you can also provide a specific exit code.The trinary?: operator evaluates its first argument; if the result is true, the result of the entire expression is the second argument; otherwise, the third.

As in bash, Perl has a dedicated “else if” condition, but its keyword iselsif rather thanelif. (For you who use both languages, these fun, minute differences either keep you mentally nimble or drive you insane.)

As Table 2.5 shows, Perl’s comparison operators are the opposite of bash’s; strings use textual operators, and numbers use traditional algebraic notation. Compare with Table 2.2 on page 44.

Table 2.5. Elementary Perl comparison operators
String Numeric True if x eq y x = y x is equal to y x ne y x != y x is not equal to y x lt y x < y x is less than y x le y x <= y x is less than or equal to y x gt y x > y x is greater than y x ge y x >= y x is greater than or equal to y

In Perl, you get all the file-testing operators shown in Table 2.3 on page 44 except for the-nt and-ot operators, which are available in bash only.

Like bash, Perl has two types offorloops. The more common form iterates through an explicit list ofarguments. For example, the code below iterates through a list ofanimals, printing one per line.

@animals = qw(lions tigers bears);
foreach $animal (@animals) {
print "$animal \n" ;
}

The more traditional C-stylefor loop is also available:

for ($counter=1; $counter <= 10; $counter++) {
printf "$counter ";
}

We’ve shown these with the traditionalfor andforeach labels, but those are in fact the same keyword in Perl and you can use whichever form you prefer.

Versions of Perl before 5.10 (2007) have no explicitcase orswitch statement, but there are several ways to accomplish the same thing. In addition to the obviousbut-clunky option of cascadingif statements, another possibility is to use afor statement to set the value of$_ and provide a context from whichlast can escape:

for ($ARGV[0]) {

m/^websphere/ && do { print "Install for websphere\n"; last; };
m/^tomcat/ && do { print "Install for tomcat\n" ; last; };
m/^geronimo/ && do { print "Install for geronimo\n"; last; };

print "Invalid option supplied.\n"; exit 1;
}

The regular expressions are compared with the argument stored in$_. Unsuccessful matches short-circuit the&& and fall through to the next test case. Once a regex matches, its correspondingdo block is executed. Thelast statements escape from thefor block immediately.

Accepting and validating input

The script belowcombines many of the Perl constructs we’ve reviewed over the last fewpages, including a subroutine, some postfixif statements, and afor loop. The program itself is merely a wrapper around the main functionget_string,a generic input validation routine. This routine prompts for a string,removes any trailing newline, and verifies that the string is not null.Null strings cause the prompt to be repeated up to three times, afterwhich the script gives up.

#!/usr/bin/perl

$maxatt = 3; # Maximum tries to supply valid input

sub get_string {
my ($prompt, $response) = shift;
# Try to read input up to $maxatt times
for (my $attempts = 0; $attempts < $maxatt; $attempts++) {
print "Please try again.\n" if $attempts;
print "$prompt: ";
$response = readline(*STDIN);
chomp($response);
return $response if $response;
}
die "Too many failed input attempts";
}
# Get names with get_string and convert to uppercase
$fname = uc get_string "First name";
$lname = uc get_string "Last name";
printf "Whole name: $fname $lname\n";

The output:

$ perl validate
First name: John Ball
Last name: Park
Whole name: JOHN BALL PARK

Theget_string function and thefor loop both illustrate the use of themy operator to create variables of local scope. By default, all variables are global in Perl.

The list of local variables forget_stringis initialized with a single scalar drawn from the routine’s argumentarray. Variables in the initialization list that have no correspondingvalue (here,$response) remain undefined.

The*STDIN passed tothe readline function is a “typeglob,” a festering wart of languagedesign. It’s best not to inquire too deeply into what it really means,lest one’s head explode. The short explanation is that Perl filehandlesare not first-class data types, so you must generally put a star infront of their names to pass them as arguments to functions.

In the assignments for$fname and$lname, theuc (convert to uppercase) andget_stringfunctions are both called without parentheses. Since there is nopossibility of ambiguity given the single argument, this works fine.

Perl as a filter

You can use Perl without ascript by putting isolated expressions on the command line. This is agreat way to do quick text transformations and one that largelyobsoletes older filter programs such as sed, awk, and tr.

Use the -pe command-line option to loop through STDIN, run a simple expression on each line, and print the result. For example, the command

ubuntu$ perl -pe 's#/bin/sh$#/bin/bash#' /etc/passwd
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/bin/bash
...

replaces /bin/sh at the end of lines in /etc/passwd with /bin/bash, emitting the transformed passwd file to STDOUT. You may be more accustomed to seeing the text substitution operator with slashes as delimiters (e.g., s/foo/bar/), but Perl allows any character. Here, the search text and replacement text both contain slashes, so it’s simpler to use# as the delimiter. If you use paired delimiters, you must use four of them instead of the normal three, e.g., s(foo)(bar).

Perl’s -a option turns on autosplit mode, which separates input lines into fields that are stored in the array named@F. Whitespace is the default field separator, but you can set another separator pattern with the -F option.

Autosplit is handy to use in conjunction with -p or its nonautoprinting variant, -n. For example, the commands below use perl -ane to slice and dice the output from two variations of df. The third line then runs jointo combine the two sets of fields on the Filesystem field, producing acomposite table that includes fields drawn from both versions of the df output.

suse$ df -h | perl -ane 'print join("\t", @F[0..4]), "\n"' > tmp1
suse$ df -i | perl -ane 'print join("\t", @F[0,1,4]), "\n"' > tmp2
suse$ join tmp1 tmp2
Filesystem Size Used Avail Use% Inodes IUse%
/dev/hda3 3.0G 1.9G 931M 68% 393216 27%
udev 126M 172K 126M 1% 32086 2%
/dev/hda1 92M 26M 61M 30% 24096 1%
/dev/hda6 479M 8.1M 446M 2% 126976 1%
...

A script version with no temporary files would look something like this:

#!/usr/bin/perl

for (split(/\n/, 'df -h')) {
@F = split;
$h_part{$F[0]} = [ @F[0..4] ];
}

for (split(/\n/, 'df -i') {
@F = split;
print join("\t", @{$h_part{$F[0]}}, $F[1], $F[4]), "\n";
}

The truly intrepid can use -i in conjunction with -peto edit files in place; Perl reads the files in, presents their linesfor editing, and saves the results out to the original files. You cansupply a pattern to -i that tells Perl how to back up the original version of each file. For example, -i.bak backs up passwd as passwd.bak. Beware—if you don’t supply a backup pattern, you don’t get backups at all. Note that there’s no space between the -i and the suffix.

Add-on modules for Perl

CPAN, the Comprehensive Perl Archive Network at cpan.org, is the warehouse for user-contributed Perl libraries. Installation of new modules is greatly facilitated by the cpan command, which acts much like a yumor APT package manager dedicated to Perl modules. If you’re on a Linuxsystem, check to see if your distribution packages the module you’relooking for as a standard feature—it’s much easier to install thesystem-level package once and then let the system take care of updatingitself over time.

On systems that don’t have a cpan command, try running perl -MCPAN -e shell as an alternate route to the same feature:

$ sudo perl -MCPAN -e shell

cpan shell -- CPAN exploration and modules installation (v1.9205)
ReadLine support available (maybe install Bundle::CPAN or Bundle::CPANxxl?)

cpan[1]> install Class::Date
CPAN: Storable loaded ok (v2.18)
CPAN: LWP::UserAgent loaded ok (v5.819)
CPAN: Time::HiRes loaded ok (v1.9711)
... several more pages of status updates ...

It’s possible for users toinstall Perl modules in their home directories for personal use, butthe process isn’t necessarily straightforward. We recommend a liberalpolicy regarding system-wide installation of third-party modules fromCPAN; the community provides a central point of distribution, the codeis open to inspection, and module contributors are identified by name.Perl modules are no more dangerous than any other open source software.

Many Perl modules use componentswritten in C for better performance. Installation involves compilingthese segments, so you need a complete development environmentincluding the C compiler and a full set of libraries.

As with most languages, themost common error found in Perl programs is the reimplementation offeatures that are already provided by community-written modules.[12] Get in the habit of visiting CPAN as the first step in tackling any Perl problem. It saves development and debugging time.

[12]Tom Christiansen commented, “That wouldn’t be my own first choice, butit is a good one. My nominee for the most common error in programs isthat they are usually never rewritten. When you take Englishcomposition, you are often asked to turn in an initial draft and then afinal revision, separately. This process is just as important inprogramming. You’ve heard the adage ‘Never ship the prototype.’ Well,that’s what’s happening: people hack things out and never rewrite themfor clarity and efficiency.”