Looking for Files
In the section on Interacting with the System we talked about
using the ls command to look for files. There we had the example of looking in
the sub-directory ./letters/taxes for specific files.
Using ls, command we might have something like this:
What if the taxes directory contained a subdirectory
for each year for the
past five years, each of these contained a subdirectory
for each month, each of
these contained a subdirectory
for federal, state, and local taxes, and each of
these contained 10 letters?
If we knew that the letter we we're looking for was somewhere in the taxes
subdirectory, the command
would show us the sub-directories
of taxes (federal, local, state), and it would show their contents. We could
then look through this output for the file we were looking for.
What if the file we were looking for was five levels deeper? We could keep
adding wildcards (*) until we reached the right directory, as in:
This might work, but what happens if the files were six levels deeper. Well,
we could add an extra wildcard.
What if it were 10 levels deeper and we didn't
know it? Well, we could fill the line with wildcards. Even if we had too many,
we would still find the file we were looking for.
Fortunately for us, we don't have to type in 10 asterisks to get what we
want. We can use the
The problem is that we now have 1,800 files to look through. Piping them
through more and looking for the right file will be very time consuming. If we
knew that it was there, but we missed it on the first pass, we would have to run
through the whole thing again.
The alternative is to have the more command search for the right file for you. Because
the output is more than one screen, more will display the first screen and at
the bottom display
The problem here is the output of the ls command. We can find out whether a
file exists by this method, but we cannot really tell where it is. If you try
this, you will see that more jumps to the spot in the output where the file is
(if it is there). However, all we see is the file name, not what directory it is
in. Actually, this problem exists even if we don't execute a search.
If you use more as the command and not the end of a pipe,
instead of just
seeing --More--, you will probably see something like
This means that you have read 16 percent of the file.
However, we don't need to use more for that. Because we don't want to look
at the entire output (just search for a particular file), we can use one of three
commands that Linux provides to do pattern searching: grep, egrep, and fgrep.
The names sound a little odd to the Linux beginner, but grep stands for
global regular expression print. The other two are
newer versions that do similar things. For example, egrep searches for patterns
that are full regular expressions and fgrep searches for fixed strings and is a
bit faster. We go into details about the grep command in the section on
looking through files.
Let's assume that we are tax consultants and have 50 subdirectories, one for each client.
Each subdirectory is further broken down by year and type of tax
(state, local, federal, sales, etc.). A couple years ago, a client
of ours bought a boat. We have a new client
who also wants to buy a boat, and we need some information in that old file.
Because we know the name of the file, we can use grep to find it, like this:
If the file is called boats, boat.txt, boats.txt, or letter.boat, the grep
will find it because grep is only looking for the pattern boat. Because that
pattern exists in all four of those file names, all four would be potential
The problem is that the file may not be called boat.txt, but rather
Boat.txt. Remember, unlike DOS,
UNIX is case-sensitive. Therefore, grep sees boat.txt and Boat.txt as different files.
The solution here would be to tell grep to look for both.
Remember our discussion on regular expressions in
the section on shell basics? Not only can we use regular expressions for file
names, we can use them in the arguments to commands. The term
regular expression is even part of grep's
name. Using regular expressions, the command might look like this:
This would now find both boat.txt and Boat.txt.
Some of you may see a problem with this as well. Not only does Linux see a
difference between boat.txt and Boat.txt, but also between Boat.txt and
BOAT.TXT. To catch all possibilities, we would have to have a command something
Although this is perfectly correct syntax and it will find the files, it does not
matter what case the word "boat" is in, it is too much work. The programmers who
developed grep realized that people would want to look for things regardless of
what case they are in. Therefore, they built in the -i option, which simply says
ignore the case. Therefore, the command
will not only find boats, boat.txt, boats.txt, and letter.boat, but it will
also find Boat.txt and BOAT.TXT as well.
If you've been paying attention, you might have noticed something. Although
the grep command will tell you about the existence of a file, it won't tell you
where it is. This is just like piping it through more. The only
difference is that we're filtering out something. Therefore, it still won't tell
you the path.
Now, this isn't greps fault. It did what it was supposed to do. We told it
to search for a particular pattern and it did. Also, it displayed that pattern
for us. The problem is still the fact that the ls command is not displaying the
full paths of the files, just their names.
Instead of ls, let's use a different command. Let's use find instead. Just as
its name implies, find is used to find things. What it finds is files. If we
change the command to look like this:
This finds what we are looking for and gives us the paths as well.
Before we go on, let's look at the syntax of the find command. There are a
lot of options and it does look foreboding, at first. We find it is easiest to
think of it this way:
In this case, the "where" is ./letters/taxes. Therefore, find starts its
search in the ./letters/taxes directory. Here, we have no search criteria; we
simply tell it to do something. That something was to
We also need to be careful because the find command we are using will also
find directories named boat. This is because we did not specify any search
criteria. If instead we wanted it just to look for regular files
(which is often a good idea), we could change the command to look like this:
Here we see the option
Too complicated? Let's make things easier by avoiding grep. There are many
different things that we can use as search criteria for find. Take a quick look
at the man-page
and you will see that you can search for a specific owner,
and even names. Instead of having grep do the search for
us, let's save a step (and time) by having find do the search for us. The command
would then look like this:
This will find any file named boat and list its respective path. The problem
here is that it will only find the files named boat. It won't find the files
boat.txt, boats.txt, or even Boat.
The nice thing is that find understands about regular expressions, so we
could issue the command like this:
(Note that we included the single quote (') to avoid the square brackets () from
being first interpreted by the shell.)
This command tells find to look for all files named both boat and Boat.
However, this won't find BOAT. We are almost there.
We have two alternatives. One is to expand the find to include all possibilities, as in
This will find all the files with any combination of those four letters and
print them out. However, it won't find boat.txt. Therefore, we need to change it
yet again. This time we have
Here we have passed the wildcard
(*) to find to tell it took find anything
that starts with "boat" (upper- or lowercase), followed by anything else. If we
add an extra asterisk, as in
we not only get boat.txt, but also newboat.txt, which the first example would
This works. Is there an easier way? Well, sort of. There is a way that is
easier in the sense that there are less characters to type in. This is:
Isn't this the same command that we issued before? Yes, it is. In this
particular case, this combination of find and grep is the easier solution,
because all we are looking for is the path to a specific file. However, these
examples show you different options of find and different ways to use them. That's one
of the nice things about Linux. There are many ways to get the same result.
Note that more recent versions of find do not require the -print options, as
this is the default behavior.
Looking for files with specific names is only one use of find. However, if you
look at the find man-page, you will see there are many other options you can
use. One thing I frequently do is to look for files that are older than a specific
age. For example, on many systems, I don't want to hang on to log files that
are older than six months. Here I could use the
Which says to find everything in the /usr/log/mylogs directory which is older
than 180 days (Not exactly six months, but it works.) If I wanted, I could have
used the -name option to have specified a particular file pattern:
One problem with this is what determines how "old" a file is? The first answer
for many people is that the age of a file is how long it has been since the
file was created. Well, if I created a file two years ago, but added new data to
it a minute ago, is it "older" than a file that I created yesterday, but have
not changed since then? It really depends on what you are interested in. For log
files, I would say that the time the data in that was last changed is more
significant than when the file was created. Therefore, the
However, that's not
always the case. Sometimes, you are interested in the last time the file was used,
or accessed. This is when you would use the -atime option. This is helpful in
find old files on your system that no one has used for a long time.
You could also use the
Three files that we specifically monitor are /etc/passwd,
/etc/shadow. Interestingly enough, we want to have these files change
once a month (/etc/shadow). This is our "proof" that the root password was
changed as it should be at regular intervals. Note that we have other mechanisms
to ensure that it was the root password that was changed and not simply changing
something else in the file, but you get the idea. One place you see this
mechanism at work is your /usr/lib/cron/run-crons file, which is started from
/etc/crontab every 15 minutes. One shortcoming of -mtime and the others is
that it measures time in 24 hour increments starting from now. That means that
you cannot find anything that was changed within the last hour, for example. For
this newer versions of find have the -cmin, -amin and -mmin options, which
measure times in minutes. So, to find all of the files changed within the last
hour (i.e. last 60 minutes) we might have something like this:
In this example, the value was preceded with a minus sign (-), which means
that we are looking for files with a value less than what we specified.
In this case, we want values less than 60 minutes. In the example above,
we use a plus-sign (+) before the value, which means values greater that what we
specified. If you use neither one, then the time is exactly what you
specified. Along the same vein, are the options -