When Things Go Wrong
Until you become very accustomed to using Linux you're likely to make mistakes
(which also happens to people who have been working with Linux for a long
time). In this section, we'll be talking about some common mistakes and
problems that occur when you first start using Linux.
Usually when you make mistakes the system will let you know in some way.
When using using the command line, the system will tell you in the form
of error messages. For example, if you try to execute a command and the
command does not exist, the system may report something like this:
bash: some_command: command not found
Such an error might occur if the command exists, but it does not reside
in a directory in your search path. You can find more about this in
the section on directory paths.
The system may still report an error, even if it can execute the command. For
example, if the command acts on a file that does not exist. For example,
the more displays the contents of a file. If the file you
want to look at does not exist, you might get the error:
some_file: No such file or directory
In the first example, the error came from your shell as it
tried to execute the command.
In the second case, the error came from
the more command as it encountered the error when trying
to access the file.
In both these cases, the problem is pretty obvious. In some cases,
you are not always sure. Often you include such commands within shell
scripts and want to change the flow of the script based on errors or success
of the program. When a command ends, it provides its "exit code" or
"return code" in the special variable $?. So after a command fails,
running this command will show you the exit code:
Note that it is up to the program to both provide the text message and the
return code. Sometimes you end up with a text message that does not make sense
(or there is no text at all), so all you get is the return code, which is
probably even less understandable. To make a translation between the return
code and a text message, check the file /usr/include/asm/errno.h.
You need to be aware that errors on one system (i.e. one Linux distribution) are not necessarily
errors on other systems. For example, if you forget the space in this command, some distributions
will give you an error:
However, on SUSE Linux, this will generate the same output as if you had not forgotten the space.
This is because the ls-l is an alias to the
command ls -l. As the name implies, an alias is a way of referring to something
by a different name. For details take a look at the section on aliases.
It has happened before that I have done a directory listing and saw a particular
file. When I tried to remove it, the system told me the file did not
exist. The most likely explanation is that I misspelled the filename, but
that wasn't it. What can happen sometimes is that a
control character ends up becoming part of the filename. This typically happens
backspace as it is not always defined as the same character on every system. Often the
However trying to erase it you get an error message. To see any " non printable" characters
you would use the
Which says the file name actually contains two o's and a trailing backspace. Since the backspace
erased the last 'o' in the display, you do not see it when the file name is displayed normally.
Sometimes you lose
control of programs and they seem to "runaway". In other cases, a program may seem to
hang and freeze your terminal. Although it is possible because of a bug in the software or a
flaky piece of hardware, oftentimes the user makes a mistake he was not even aware of. This
can be extremely frustrating for the beginner, since you do not even know how you got
yourself into the situation, let alone how to get out.
When I first started learning Unix (even before Linux was born) I would start programs and
quickly see that I needed to stop them. I knew I could stop the program with some combination
of the control key and some other letter. In my rush to stop the program, I
would press the control key and many different letters in sequence. On some occassions,
the program simply stop and goes no further. On other occasions, the program would appear to
stop, but I would later discover that it was still running. What happened was that I hit a
combination that did not stop the program but did something else.
In the first example, where the program would stop and go no further, I had "suspended" the
program. In essence, I'd put it to sleep and it would wait for me to tell it to start up again.
This is typically done by pressing
In the second example, where the program seemed to have disappeared, I had also suspended
the program but at the same time had put in the "background".
This special feature of Unix shells dates from the time before graphical interfaces were common. It was a great waste
of time to start a program and then have to wait for it to complete, when all you were
interested in was the output which you could simply write to file. Instead you put a program
in the background and the shell returned to the prompt, ready for the next command. It's
sometimes necessary to do this once a command is started, which you do by pressing CTRL-Z,
which suspends the program, but returns to the prompt. You then issue the bg command, which
starts the previous command in the background. (This is all part of "job control" which is
discussed in another section.)
To stop the program, what I actually wanted to do was to "interrupt" it. This is typical done
What this actually does is to send a signal to the program, in this case an interrupt signal. You
can define which signal is sent when you press any given combination of keys. We talk about this in
the section on terminal settings.
When you put a command in the background which send output to the screen, you need to be
careful about running other programs in the meantime. What could happen is that your output
gets mixed up, making it difficult to see which output belongs to which command.
There have been occasions where I have issued a command and the shell jumps to the next line,
then simply displays a greater than symbol (>). What this often means is that the shell does
not think you are done with the command. This typically happens when you are enclosing
something on the command line quotes in you forget to close the quotes.
For example if I wanted to search
for my name in a file I would use the grep command. If I were to do it like this:
I would get an error message saying that the file "Mohr" did not exist.
To issue this command correctly I would have to include my name inside quotes, like this:
However, if I forgot the final quote, for example, the shell would not think the command was
done yet and would perceive the enter key that I pressed as part of the command. What I
would need to do here is to interrupt the command, as we discussed previously. Note this
can also happen if you use single quotes. Since the shell does not see any difference
between a single quote and an apostrophe, you need to be careful with what you type. For
example if I wanted to print the phrase "I'm Jim", I might be tempted to do it like this:/P>
However, the system does not understand contractions and thinks I have not finished the
As we will discuss in the section on pipes and redirection, you can send
the output of a command to a file. This
is done with the greater than symbol (>). The generic syntax looks like this:
command > filename
This can cause problems if the command you issue expects more arguments than you gave it.
For example, if I were searching the contents of a file for occurrences of a particular
What would happen is the shell would drop down to the next line and simply wait forever or
until you interrupted the command. The reason is that the grep command can also take input
from the command line. It is waiting for you to type in text, before it will begin searching.
Then if it finds the phrase you are looking for it will write it into the file. If that's not what
you want the solution here is also to interrupt the command. You can also enter the end of
file character (CTRL-D), which would tell grep to stop reading input.
One thing to keep in mind, is that you can put a program in the background even if the shell
does not understand job control. In this case, it is impossible to bring the command back to the
foreground in order to interrupt. You need to do something else. As we discussed earlier, Linux
provides you a tool to display the processes which you are currently running (the ps
command). Simply typing ps on the command line might give you something like this:
PID TTY TIME CMD
29518 pts/3 00:00:00 bash
30962 pts/3 00:00:00 ps
The PID column in the ps output is the process identifier
If not run in the background, the child processes will continue to do its job until its finished
and then report back to its parent when it is done. A little house cleaning is done and the
process disappears from the system. However, sometimes, the child doesn't end like it is
supposed to. One case is when it becomes a "runaway" process. There are a number of causes of
runaway processes, but essentially it means that the process is no longer needed but does not
disappear from the system
The result of this is often the parent cannot end either. In general, the parent should not
end until all of its children are done (however there are cases where it is desired). If
processes continue to run they take up resource and can even bring the system to a stand still.
In cases where you have "runaway" processes or any other time where as process is running that
you need to stop, you can send any process a signal to stop execution if
you know its PID. This is the kill command and syntax is quite
By default, the kill command sends a termination signal to that process. Unfortunately,
there are some cases where a process can ignore that termination signal. However, you can
send a much more urgent "kill" signal like this:
Where "9" is the number of the SIGKILL or kill signal. In general, you should first try
to use signal 15 or SIGTERM. This sends a terminate singal and gives the process a chance
to end "gracefully". You should also look to see if the process you want to stop has any
For details on what other signals can be sent
and the behavior in different circumstances look at the kill man-page
or simply try
1) SIGHUP 2) SIGINT 3) SIGQUIT 4) SIGILL
5) SIGTRAP 6) SIGABRT 7) SIGBUS 8) SIGFPE
9) SIGKILL 10) SIGUSR1 11) SIGSEGV 12) SIGUSR2
13) SIGPIPE 14) SIGALRM 15) SIGTERM 17) SIGCHLD
18) SIGCONT 19) SIGSTOP 20) SIGTSTP 21) SIGTTIN
22) SIGTTOU 23) SIGURG 24) SIGXCPU 25) SIGXFSZ
26) SIGVTALRM 27) SIGPROF 28) SIGWINCH 29) SIGIO
30) SIGPWR 31) SIGSYS 35) SIGRTMIN 36) SIGRTMIN+1
37) SIGRTMIN+2 38) SIGRTMIN+3 39) SIGRTMIN+4 40) SIGRTMIN+5
41) SIGRTMIN+6 42) SIGRTMIN+7 43) SIGRTMIN+8 44) SIGRTMIN+9
45) SIGRTMIN+10 46) SIGRTMIN+11 47) SIGRTMIN+12 48) SIGRTMIN+13
49) SIGRTMIN+14 50) SIGRTMAX-14 51) SIGRTMAX-13 52) SIGRTMAX-12
53) SIGRTMAX-11 54) SIGRTMAX-10 55) SIGRTMAX-9 56) SIGRTMAX-8
57) SIGRTMAX-7 58) SIGRTMAX-6 59) SIGRTMAX-5 60) SIGRTMAX-4
61) SIGRTMAX-3 62) SIGRTMAX-2 63) SIGRTMAX-1 64) SIGRTMAX
Keep in mind that sending signals to a process is not just to kill a process. In fact, sending
signals to processes is a common way for processes to communicate with each other. You can find
more details about signals in the section on interprocess communication.
In some circumstances, it is not easy to kill processes by their PID. For example, if
something starts dozens of other processes, it is ineffective to try to input all of their PIDs.
To solve this problem Linux has the killall command and takes the
command name instead of the PID. You can also use the
There have been cases where I have frantically tried to stop a runaway program and
It has happened to me a number of times, that the screen saver was activated and it was if the
system had simply frozen. There were no error messages, no keys work and the machine did not even respond across the network (telnet, ping, etc.) Unfortunately, the only thing to do in this case is to turn the computer off and then on again.
On the other hand, you can prevent these problems in advance. THe most likely cause it that the
Advanced Power Management (APM) is having problems. In this case, you should disable the APM within the
system BIOS. Some machines also have something called "hardware monitoring". This can cause problems, as well, and should be disabled.
Problems can also be caused by the Advanced Programmable Interrup controller. This can be deactivated
by changing the boot string used by either LILO or grub. In addtion, you can disable it by adding
"disableapic" to your boot line.