This section is old and incomplete, even as an introduction. Although it covers a number of different aspects of perl, this section really needs to be redone. Any volunteers?
If you plan to do anything serious on the Web, I suggest that you learn perl. In fact, if you plan to do anything serious on your machine, then learning perl is also a good idea. Although not available on a lot of commercial versions, perl is almost universally available with Linux.
Now, I am not saying that you shouldn’t learn sed, awk, and shell programming. Rather, I am saying that you should learn all four. Both sed and awk have been around for quite a while, so they are deeply ingrained in the thinking of most system administrators. Although you could easily find a shell script on the system that didn’t have elements of sed or awk in it, you would be very hard pressed to find a script that had no shell programming in it. On the other hand, most of the scripts that process information from other programs use either sed or awk. Therefore, it is likely that you will eventually come across one or the other.
perl is another matter altogether. None of the standard scripts have perl in them. This does not say anything about the relative value of perl, but rather the relative availability of it. Because it can be expected that awk and sed are available, it makes sense that they are commonly used. perl may not be on your machine and including it in a system shell script might cause trouble.
In this section, I am going to talk about the basics of perl. We’ll go through the mechanics of creating perl scripts and the syntax of the perl language. There are many good books on perl, so I would direct you to them to get into the nitty-gritty. Here we are just going to cover the basics. Later on, we’ll address some of the issues involved with making perl scripts to use on your Web site.
One aspect of perl that I like is that it contains the best of everything. It has aspects of C, shell, awk, sed and many other things. perl is also free. The source code is readily available and the versions that I have came with configuration scripts that determined the type of system I had and set up the make-files accordingly. Aside from Linux, I was able to compile the exact same source on my Sun Solaris workstation. Needless to say, the scripts that I write at home run just as well at work.
I am going to make assumptions as to what level of programming background you have. If you read and understood the sections on sed, awk, and the shell, then you should be ready for what comes next. In this section, I am going to jump right in. I am not going to amaze you with demonstrations of how perl can do I/O, as that’s what we are using it for in the first place. Instead, I am going to assume that you want to do I/O and jump right into how to do it.
Lets create a shell script called hello.pl. The pl extension has no real meaning, although I have seen many places where it is always used as an extension. It is more or less conventional to do this, just as text files traditionally have the extension .txt, shell scripts end in .sh, etc.
We’ll start off with the traditional
print “Hello, World! “;
This shell script consists of a single perl statement, whose purpose is to output the text inside the double-quotes. Each statement in perl is followed by a semi-colon. Here, we are using the perl print function to output the literal string “Hello, World! ” (including the trailing new line). Although we don’t see it, there is the implied file handle to stdout. The equivalent command with the explicit reference would be
print STDOUT “Hello, World! “;
Along with STDOUT, perl has the default file handlers STDIN and STDERR. Here is a quick script that demonstrates all three as well as introduces a couple of familiar programming constructs:
while (<STDIN>)
{
if ( $_ eq ” ” )
{
print STDERR “Error: “;
} else {
print STDOUT “Input: $_ “;
}
}
Functioning the same as in C and most shells, the while line at the top says that as long as there is something coming from STDIN, do the loop. Here we have the special format (<STDIN>), which tells perl where to get input. If we wanted, we could use a file handle other than STDIN. However, we’ll get to that in a little bit.
One thing that you need to watch out for is that you must include blocks of statements (such as after while or if statements) inside the curly braces ({}). This is different from the way you do it in C, where a single line can follow while or if. For example, this statement is not valid in perl:
while ( $a < $b )
$a++;
You would need to write it something like this:
while ( $a < $b ) {
$a++;
}
Inside the while loop, we get to an if statement. We compare the value of the special variable $_ to see if it is empty. The variable $_ serves several functions. In this case, it represents the line we are reading from STDIN. In other cases, it represents the pattern space, as in sed. If the latter is true, then just the Enter key was pressed. If the line we just read in is equal to the newline character (just a blank line), we use the print function, which has the syntax
print [filehandler] “text_to_print”;
In the first case, filehandler is stderr and in the second case stdout is the filehandler. In each case, we could have left out the filehandler and the output would go to stout.
Each time we print a line, we need to include a newline ( ) ourselves.
We can format the print line in different ways. In the second print line, where the input is not a blank line, we can print “Input: ” before we print the line just input. Although this is a very simple way of outputting lines, it gets the job done. More complex formatting is possible with the perl printf function. Like its counterpart in C or awk, you can come up with some very elaborate outputs. We’ll get into more details later.
One more useful function for processing lines of input is split. The split function is used to, as its name implies, to split a line based on a field separator that you define. Say, for example, a space. The line is then stored in an array as individual elements. So, in our example, if we wanted to input multiple words and have them parsed correctly, we could change the script to look like this:
while (<STDIN>)
{
@field = split( ,$_);
if ( $_ eq ” ” )
{
print STDERR “Error: “;
} else {
print STDOUT “$_ “;
print $field[0];
print $field[1];
print $field[2];
}
}
The split function has the syntax
split(pattern,line);
where pattern is our field separator and line is the input line. So our line
@field = split( ,$_);
says to split the line we just read in (stored in $_) and use a space ( ) as the field separator. Each field is then placed into an element of the array field. The @ is needed in front of the variable field to indicate that it’s an array. In perl, there are several types of variables. The first kind we have already met before. The special variable $_ is an example of a scalar variable. Each scalar variable is preceded by a dollar sign ($) and can contain a single value, whether a character string or a number. How does perl tell the difference? It depends on the context. perl will behave correctly by looking at what you tell it to do with the variable. Other examples of scalars are
$name = “jimmo”;
$initial = j;
$answertolifetheuniverseandeverything = 42;
Another kind of variable is an array, as we mentioned before. If we precede a variable with %, we have an array. But don’t we have an array with @? Yes, so whats the difference? The difference is that arrays, starting with the @, are referenced by numbers, while those starting with the % are referenced by a string. We’ll get to how that works as we move along.
In our example, we are using the split function to fill up the array @field. This array will be referenced by number. We see the way it is referenced in the three print statements toward the end of the script.
If our input line had a different field separator (for example, %), the line might look like this:
@field = split(%,$_);
In this example, we are outputting the first three words that are input. But what if there are more words? Obviously we just add more print statements. What if there are fewer words? Now we run into problems. In fact, we run into problems when adding more print statements. The question is, where do we stop? Do we set a limit on the number of words that can be input? Well, we can avoid all of these problems by letting the system count for us. Changing the script a little, we get
while (<STDIN>)
{
@field = split( ,$_);
if ( $_ eq ” ” )
{
print STDERR “Error: “;
} else {
foreach $word (@field){
print $word,” “;
}
}
}
In this example, we introduce the foreach construct. This has the same behavior as a for loop. In fact, in perl, for and foreach are interchangeable, provided you have the right syntax. In this case, the syntax is
foreach $variable (@array)
where $variable is our loop variable and @array is the name of the array. When the script is run, @array is expanded to its components. So, if we had input four fruits, our line might have looked like this:
foreach $word(apple,bananna,cherry,orange);
Because I don’t know how many elements there are in the array field, foreach comes in handy. In this example, every word separated by a space will be printed on a line by itself, like this:
perl script.pl
one two three
one
two
three
^D
Our next enhancement is to change the field separator. This time we’ll use an ampersand (&) instead. The split line now looks like this:
@field = split(&,$_);
When we run the script again with the same input, what we get is a bit
different:
# perl script.pl
one two three
one two three
The reason why we get the output on one line is because the space is no longer a field separator. If we run it again, this time using &, we get something different:
# perl script.pl
one&two&three
one
two
three
This time, the three words were recognized as separate fields.
Although it doesn’t seem too likely that you would be inputting data like this from the keyboard, it is conceivable that you might want to read a file that has data stored like this. To make things easy, I have provided a file that represents a simple database of books. Each line is a record and represents a single book, with the fields separated by %.
To be able to read from a file, we must create a file handle. To do this, we add a line and change the while statement so it looks like this:
open ( INFILE,”< bookdata.txt”);
while (<INFILE>)
The syntax of the open function is
open(file_handle,openwhat_&_how);
The way we open a file depends on the way we want to read it. Here, we use standard shell redirection symbols to indicate how we want to read the specified file. In our example, we indicate redirection from the file bookdata.txt. This says we want to read from the file. If we wanted to open the file for writing, the line would look like this:
open ( INFILE,”> bookdata.txt”);
If we wanted to append to the file, we could change the redirections so the line would look like this:
open ( INFILE,”>> bookdata.txt”);
Remember I said that we use standard redirection symbols. This also includes the pipe symbol. As the need presents itself, your perl script can open a pipe for either reading or writing. Assuming that we want to open a pipe for writing that sends the output through sort, the line might look like this:
open ( INFILE,”| sort “);
Remember that this would work the same as from the command line. Therefore, the output is not being written to a file; it is just being piped through sort. However, we could redirect the output of sort , if we wanted. For example:
open ( INFILE,”| sort > output_file”);
This opens the file output_file for writing, but the output is first piped through sort . In our example, we are opening the file bookdata.txt for reading. The while loop continues through and outputs each line read. However, instead of being on a single line, the individual fields (separated by &) are output on a separate line.
We can now take this one step further. Lets now assume that a couple of the fields are actually composed of subfields. These subfields are separated by a plus sign (+). We now want to break up every field containing + into its individual subfields.
As you have probably guessed, we use the split command again. This time, we use a different variable and instead of reading out of the input line ($_), we read out of the string $field. Therefore, the line would look like this:
@subfield = split(\+,$field);
Aside from changing the search pattern, I added the back slash (\) because + is used in the search pattern to represent one or more occurrences of the preceding character. If we don’t escape it, we generate an error. The whole script now looks like this:
open(INFILE,”<bookdata.txt”);
while (<INFILE>)
{
@data = split(&,$_);
if ( $_ eq ” ” )
{
print STDERR “Error: “;
} else {
foreach $field (@data){
@subfield = split(\+,$field);
foreach $word (@subfield){
print $word,” “;
}
}
}
}
If we wanted, we could have written the script to split the incoming lines at both & and +. This would have given us a split line that looked like this:
@data = split([&\+],$_);
The reason for writing the script like we did was that it was easier to separate subfields and still maintain their relationships. Note that the search pattern used here could have been any regular expression. For example, we could have split the strings every place there was the pattern Di followed by e, g, or r, but not if it was followed by i. The regular expression would be
Di[reg][^i]
so the split function would be:
@data = split(Di[reg][^i],$_);
At this point, we can read in lines from an ASCII file, separate the lines based on what we have defined as fields, and then output each line. However, the lines don’t look very interesting. All we are seeing is the content of each field and do not know what each field represents. Let’s change the script once again. This time we will make the output show us the field names as well as their content.
Lets change the script so that we have control over where the fields end up. We still use the split statement to extract individual fields from the input string. This is not necessary because we can do it all in one step, but I am doing it this way to demonstrate the different constructs and to reiterate that in perl, there is always more than one way do to something. So, we end up with the following script:
open(INFILE,”< bookdata.txt”);
while (<INFILE>)
{
@data = split(&,$_);
if ( $_ eq ” ” )
{
print STDERR “Error: “;
} else {
$fields = 0;
foreach $field (@data){
$fieldarray[$fields] = $field;
print $fieldarray[$fields++],” “;
}
}
}
Each time we read a line, we first split it into the array @data, which is then copied into the fields array. Note that there is no new line in the print statement, so each field will be printed with just a space and the newline read at the end of each input line will then be output. Each time through the loop, we reset our counter (the variable $fields) to 0.
Although the array is re-filled every time through the loop and we lose the previous values, we could assign the values to specific variables.
Lets now make the output a little more attractive by outputting the field headings first. To make things simpler, lets label the fields as follows
title, author, publisher, char0, char1, char2, char3, char4, char5
where char0-char5 are simply characteristics of a book. We need a handful of if statements to make the assignment, which look like this:
foreach $field (@data){
if ( $fields = = 0 ){
print “Title: “,$field;
}
if ( $fields = = 1 ){
print “Author: “,$field;
}
*
*
*
if ( $fields = = 8 ){
print “Char 5: “,$field;
}
Here, too, we would be losing the value of each variable every time through the loop as they get overwritten. Lets just assume we only want to save this information from the first line (our reasoning will become clear in a minute). First we need a counter to keep track of what line we are on and an if statement to enter the block where we make the assignment. Rather than a print statement, we change the line to an assignment, so it might look like this:
$title = $field;
When we read subsequent lines, we can output headers for each of the fields. We do this by having another set of if statements that output the header and then the value, which is based on its position.
Actually, there is a way of doing things a little more efficiently. When we read the first line, we can assign the values to variables on a single line. Instead of the line
foreach $field (@data) {
we add the if statement to check if this is the first line. Then we add the line
($field0,$field1,$field2,$field3,$field4,$field5,$field6,$field7,$field8)=
split(&,$_);
Rather than assigning values to elements in an array, we are assigning them to specific variables. (Note that if there are more fields generated by the split command than we specified variables for, the remaining fields are ignored.) The other advantage of this is that we saved ourselves a lot of space. We could also call these $field1, $field2, etc., thereby making the field names a little more generic. We could also modify the split line so that instead of several separate variables, we have them in a single array called field and we could use the number as the offset into the array. Therefore, the first field would be referenced like this:
$field[0]
The split command for this would look like this
@field=split(&,$_);
which looks like something we already had. It is. This is just another example of the fact that there are always several different ways of doing things in perl.
At this point, we still need the series of if statements inside of the foreach loop to print out the line. However, that seems like a lot of wasted space. Instead, I will introduce the concept of an associated list. An associated list is just like any other list, except that you reference the elements by a label rather than a number.
Another difference is that associated arrays, also referred to as associated lists, are always an even length. This is because elements come in pairs: label and value. For example, we have:
%list= (name,James Mohr, logname, jimmo, department,IS);
Note that instead of $ or @ to indicate that this is an array, we use %. This specifies that this is an associative array, so we can refer to the value by label; however, when we finally reference the value, we use $. To print out the name, the line would look like this:
print “Name:”,$list{name};
Also, the brackets we use are different. Here we use curly braces ({}) instead of square brackets ([]).
The introduction of the associate array allows us to define field labels within the data itself and access the values using these labels. As I mentioned, the first line of the data file containing the field labels. We can use these labels to reference the values. Lets look at the program itself:
open(INFILE,”< bookdata.txt”);
$lines=0;
while (<INFILE>)
{
chop;
@data = split(&,$_);
if ( $lines == 0 )
{
@headlist=split(&,$_);
foreach $field (0..@headlist-1){
%headers = ( $headlist[$field], );
}
$lines++;
} else {
foreach $field (0..@data-1){
$headers{$headlist[$field]}=@data[$field];
print $headlist[$field],”: “, $headers{$headlist[$field]},” “;
}
}
}
At the beginning of the script, we added the chop function, which “chops” off the last character of a list or variable and returns that character. If you don’t mention the list or variable, chop affects the $_ variable. This function is useful to chop off the newline character that gets read in. The next change is that we removed the block that checked for blank lines and generated an error.
The first time we read a line, we entered the appropriate block. Here, we just read in the line containing the field labels and we put each entry into the array headlist via the split function. The foreach loop also added some new elements:
foreach $field (0..@headlist-1){
%headers = ( $headlist[$field], );
}
The first addition is the element (0.. @headlist-1). Two numbers separated by two dots indicate a range. We can use @headlist as a variable to indicate how many elements are in the array headlist. This returns a human number, not a computer number (one that starts at 0). Because I chose to access all my variables starting with 0, I needed to subtract 1 from the value of @headlist. There are nine elements per line in the file bookdata.txt; therefore, their range is 0..9-1.
However, we don’t need to know that! In fact, we don’t even know how many elements there are to make use of this functionality. The system knows how many elements it read in, so we don’t have to. We just use @headlist-1 (or whatever).
The next line fills in the elements of our associative array:
%headers = ( $headlist[$field], );
However, we are only filling in the labels and not the values themselves. Therefore, the second element of the pair is empty (). One by one, we write the label into the first element of each pair.
After the first line is read, we load the values. Here again, we have a foreach loop that goes from 0 to the last element of the array. Like the first loop, we don’t need to know how many elements were read, as we let the system keep track of this for us. The second element in each pair of the associative list is loaded with this line:
$headers{$headlist[$field]}=@data[$field];
Lets take a look at this line starting at the left end. From the array @data (which is the line we just read in), we are accessing the element at the offset that is specified by the variable $field. Because this is just the counter used for our foreach loop, we go through each element of the array data one by one. The value retrieved is then assigned to the left-hand side.
On the left, we have an array offset being referred to by an array offset. Inside we have
$headlist[$field]
The array headlist is what we filled up in the first block. In other words, the list of field headings. When we reference the offset with the $field variable, we get the field heading. This will be used as the string for the associative array. The element specified by
$headers{$headlist[$field]}
corresponds to the field value. For example, if the expression
$headlist[$field]}
evaluated to title, the second time through the loop, the expression $headers{$headlist[$field} would evaluate to “2010: Odyssey Two.”
At this point, we are ready to make our next jump. We are going to add the functionality to search for specific values in the data. Lets assume that we know what the fields are and wish to search for a particular value. For example, we want all books that have scifi as field char0. Assuming that the script was called book.pl, we would specify the field label and value like this:
perl book.pl char0=scifi
Or we could add #!/usr/bin/perl to the top of the script to force the system to use perl as the interpreter. We would run the script like this:
book.pl char0=scifi
The completed script looks like this:
($searchfield,$searchvalue) = split(=,$ARGV[0]);
open(INFILE,”< bookdata.txt”);
$lines=0;
while (<INFILE>)
{
chop;
@data = split(&,$_);
if ( $_ eq ” ” )
{
print STDERR “Error: “;
} else {
if ( $lines == 0 )
{
@headlist=split(&,$_);
foreach $field (0..@headlist-1){
%headers = ( $headlist[$field], );
}
$lines++;
} else { foreach $field (0..@data-1){
$headers{$headlist[$field]}=@data[$field];
if ( ($searchfield eq $headlist[$field] ) &&
($searchvalue eq $headers{$headlist[$field]} )) {
$found=1;
}
}
}
}
if ( $found == 1 )
{
foreach $field (0..@data-1){
print $headlist[$field],”: “, $headers{$headlist[$field]},” “;
}
}
$found=0;
< P>}We added a line at the top of the script that splits the first argument on
the command line:
($searchfield,$searchvalue) = split(=,$ARGV[0]);
Note that we are accessing ARGV[0]. This is not the command being called, as one would expect in a C or shell program. Our command line has the string char0=scifi as its $ARGV[0]. After the split, we have $searchfield=char0 and $searchvalue=scifi.
Some other new code looks like this:
if ( ($searchfield eq $headlist[$field] ) &&
($searchvalue eq $headers{$headlist[$field]} )) {
$found=1;
Instead of outputting each line in the second foreach loop, we are changing it so that here we are checking to see if the field we input, $searchfield, is the one we just read in $headlist[$field] and if the value we are looking for, ($searchvalue), equals the one we just read in.
Here we add another new concept: logical operators. These are just like in C, where && means a logical AND and || is a logical OR. If we want a logical comparison of two variables and each has a specific value, we use the logical AND, like
if ( $a == 1 && $b = 2)
which says if $a equals 1 AND $b equals 2, execute the following block. If we wrote it like this
if ( $a == 1 || $b = 2)
it would read as follows: if $a equals 1 OR $b equals 2, execute the block. In our example, we are saying that if the search field ($searchfield) equals the corresponding value in the heading list ($headlist[$field]) AND the search value we input ($searchvalue) equals the value from the file ($headers{$headlist[$field]}), we then execute the following block. Our block is simply a flag to say we found a match.
Later, after we read in all the values for each record, we check the flag. If the flag was set, the foreach loop is executed:
if ( $found == 1 )
{
foreach $field (0..@data-1){
print $headlist[$field],”: “, $headers{$headlist[$field]},” “;
}
Here we output the headings and then their corresponding values. But what if we aren’t sure of the exact text we are looking for. For example, what if we want all books by the author Eddings, but do not know that his first name is David? Its now time to introduce the perl function index. As its name implies, it delivers an index. The index it delivers is an offset of one string in another. The syntax is
index(STRING,SUBSTRING,POSITION)
where STRING is the name of the string that we are looking in, SUBSTRING is the substring that we are looking for, and POSITION is where to start looking. That is, what position to start from. If POSITION is omitted, the function starts at the beginning of STRING. For example
index(pie,applepie);
will return 5, as the substring pie starts at position 5 of the string applepie. To take advantage of this, we only need to change one line. We change this
if ( ($searchfield eq $headlist[$field] ) &&
($searchvalue eq $headers{$headlist[$field]} )) {
to this
if ( (index($headlist[$field],$searchfield)) != -1 &&
index($headers{$headlist[$field]},$searchvalue) != -1 ) {
Here we are looking for an offset of -1. This indicates the condition where the substring is not within the string. (The offset comes before the start of the string.) So, if we were to run the script like this
script.pl author=Eddings
we would look through the field author for any entry containing the string Eddings. Because there are records with an author named Eddings, if we looked for Edding, we would still find it because Edding is a substring of “David Eddings.”
As you might have noticed, we have a limitation in this mechanism. We must ensure that we spell things with the right case. Because Eddings is uppercase both on the command line and in the file, there is no problem. Normally names are capitalized, so it would make sense to input them as such. But what about the title of a book? Often, words like “the” and “and” are not capitalized. However, what if the person who input the data, input them as capitals? If you looked for them in lowercase, but they were in the file as uppercase, you’d never find them.
To consider this possibility, we need to compare both the input and the fields in the file in the same case. We do this by using the tr (translate) function. It has the syntax
tr/SEARCHLIST/REPLACEMENTLIST/[options]
where SEARCHLIST is the list of characters to look for and REPLACEMENTLIST is the characters to use to replace those in SEARCHLIST. To see what options are available, check the perl man-page. We change part of the script to look like this:
foreach $field (0..@data-1){
$headers{$headlist[$field]}=@data[$field];
($search1 = $searchfield) =~ tr/A-Z/a-z/;
($search2 = $headlist[$field] ) =~ tr/A-Z/a-z/;
($search3 = $searchvalue)=~tr/A-Z/a-z/;
($search4 = $headers{$headlist[$field]})=~tr/A-Z/a-z/;
if ( (index($search2,$search1) != -1) && (index($search4,$search3) != -1) ) {
$found=1;
}
}
In the middle of this section are four lines where we do the translations. This demonstrates a special aspect of the tr function. We can do a translation as we are assigning one variable to another. This is useful because the original strings are left unchanged. We must change the statement with the index function and make comparisons to reflect the changes in the variables.
So at this point, we have created an interface in which we can access a “database” and search for specific values.
When writing conditional statements, you must be sure of the condition you are testing. Truth, like many other things, is in the eye of the beholder. In this case, it is the perl interpreter that is beholding your concept of true. It may not always be what you expect. In general, you can say that a value is true unless it is the null string (), the number zero (0), or the literary string zero (“0”).
One important feature of perl is the comparison operators. Unlike C, there are different operators for numeric comparison and for string comparison. They’re all easy to remember and you have certainly seen both sets before, but keep in mind that they are different. Table 0-8 contains a list of the perl comparison operators and Table 0-9 contains a list of perl operations.
Table -8 perl Comparison Operators
Numeric |
String |
Comparison |
== |
eq |
equal to |
!= |
ne |
not equal to |
> |
gt |
greater than |
< |
lt |
less than |
>= |
ge |
greater than or equal to |
<= |
le |
less than or equal to |
<=> |
cmp |
not equal to and sign is returned (0 – strings equal, 1 – first string less, -1 – first string greater) |
Another important aspect that you need to keep in mind is that there is really no such thing as a numeric variable. Well, sort of. perl is capable of distinguishing between the two without you interfering. If a variable is used in a context where it can only be a string, then that’s they way perl will interpret it as a string.
Lets take two variables: $a=2 and $b=10. As you might expect, the expression $a < $b evaluates to true because we are using the numeric comparison operator <. However, if the expression were $a lt $b, it would evaluate to false. This is because the string “10” comes before “2” lexigraphically (it comes first alphabetically).
Besides simply translating sets of letters, perl can also do substitution. To show you this, I am going to show you another neat trick of perl. Having been designed as a text and file processing language, it is very common to read in a number of lines of data and processing them all in turn. We can tell perl that it should assume we want to read in lines although we don’t explicitly say so. Lets take a script that we call fix.pl. This script looks like this:
s/James/JAMES/g;
s/Eddings/EDDINGS/g;
This syntax is the same as you would find in sed; however, perl has a much larger set of regular expressions. Trying to run this as a script by itself will generate an error; instead, we run it like this:
perl -p fix.pl bookdata.pl
The -p option tells perl to put a wrapper around your script. Therefore, our script would behave as though we had written it like this:
while (<>) {
s/James/JAMES/g;
s/Eddings/EDDINGS/g;
} continue {
print;
}
This would read each line from a file specified on the command line, carry out the substitution, and then print out each line, changed or not. We could also take advantage of the ability to specify the interpreter with #!. The script would then look like
#!/usr/bin/perl -p
s/James/JAMES/g;
s/Eddings/EDDINGS/g;
Another command line option is -i. This stands for “in-place,” and with it you can edit files “in-place.” In the example above, the changed lines would be output to the screen and we would have to redirect them to a file ourselves. The -i option takes an argument, which indicates the extension you want for the old version of the file. So, to use the option, we would change the first line, like this:
#!/usr/bin/perl -pi.old
With perl, you can also make your own subroutines. These subroutines can be written to return values, so that you have functions as well. Subroutines are first defined with the sub keyword and are called using &. For example:
#!/usr/bin/perl
sub usage {
print “Invalid arguments: @ARGV “;
print “Usage: $0 [-t] filename “;
}
if ( @ARGV < 1 || @ARGV > 2 ) {
&usage;
}
This says that if the number of arguments from the command line @ARGV is less than 1 or greater than 2, we call the subroutine usage, which prints out a usage message.
To create a function, we first create a subroutine. When we call the subroutine, we call it as part of an expression. The value returned by the subroutine/function is the value of the last expression evaluated.
Lets create a function that prompts you for a yes/no response:
#!/usr/bin/perl
if (&getyn(“Do you *really* want to remove all the files in this directory? “)
eq “y ” )
{
print ”Don’t be silly! “
}
sub getyn{
print @_;
$response = (<STDIN>);
}
This is a very simple example. In the subroutine getyn, we output everything that is passed to the subroutine. This serves as a prompt. We then assign the line we get from stdin to the variable $response. Because this is the last expression inside the subroutine to be evaluated, this is the value that is returned to the calling statement.
If we enter “y” (which would include the new line from the Enter key), the calling if statement passes the actual prompt as an argument to the subroutine. The getyn subroutine could then be used in other circumstances. As mentioned, the value returned includes the new line; therefore, we must check for “y .” This is not “y” or “n,” but rather “y#” followed by a newline.
Alternatively, we could check the response inside the subroutine. In other words, we could have added the line
$response =~ /^y/i;
We addressed the =~ characters earlier in connection with the tr function. Here as well, the variable on the left-hand side is replaced by the “evaluation” of the right. In this case, we use a pattern-matching construct: /^y/i. This has the same behavior as sed, where we are looking for a y at the beginning of the line. The trailing i simply says to ignore the case. If the first character begins with a y or Y, the left-hand side ( $response) is assigned the value 1; if not, it becomes a null string.
We now change the calling statement and simply leave off the comparison to “y “. Because the return value of the subroutine is the value of the last expression evaluated, the value returned now is either “1” or “. Therefore, we don’t have to do any kind of comparison, as the if statement will react according to the return value.
I wish I could go on. I haven’t even hit on a quarter of what perl can do. Unfortunately, like the sections on sed and awk, more details are beyond the scope of this book. Instead, I want to refer you to a few other sources. First, there are two books from O’Reilly and Associates. The first is Learning perl by Randal Schwartz. This is a tutorial. The other is Programming perl by Larry Wall and Randal Schwartz. If you are familiar with other UNIX scripting languages, I feel you would be better served by getting the second book.
The next suggestion I have is that you get the perl CD-ROM from Walnut Creek CD-ROM (www.cdrom.com). This is loaded with hundreds of megabytes of perl code and the April 1996 version, which I used, contains the source code for perl 4 (4.036) and perl5 (5.000m). In many cases, I like this approach better because I can see how to do the things I need to do. Books are useful to get the basics and reminders of syntax, options, etc. However, seeing someone else’s code shows me how to do it.
Another good CD-ROM is the Mother of PERL CD from InfoMagic (www.infomagic.com). It, too, is loaded with hundreds of megabytes of perl scripts and information.
There are a lot of places to find sample scripts while you are waiting for the CD to arrive. One place is the Computers and Internet: Programming Languages: Perl hierarchy at Yahoo. (www.yahoo.com). You can use this as a springboard to many sites that not only have information on perl but data on using perl on the Web (e.g.,in CGI scripts).