|
Debugging software crashes is one of the most difficult parts of real-time
and embedded software development. Software crashes when an application performs an illegal operation and the
operating system is forced to abort the execution of the application. Here we
will discuss several causes of crash in typical embedded application. A good
understanding of C to assembly would be
helpful in understanding the content described here.
The following software problems lead to crashes:
Invalid array indexing is one of the biggest source of crashes in C and C++
programs. Both the languages do not support array bound checking, thus invalid
array indexing usually goes undetected during testing. Out of bound array
indexing will corrupt data structures that allocated memory after the array.
Another point often missed in analyzing array indexing problems is the
fact that invalid array indexing can corrupt data structures declared before the
array. This happens when the array is indexed with a very large unsigned number
that represents a negative number in signed arithmetic. Consider an array b
which
is accidentally indexed with the number 0xFFFFFFFF, Since array index is
considered to be a signed integer, this access will be treated as an access to
-1 index. Thus this access will corrupt variables declared before the
array, i.e. memory allocated to a. If the
array is indexed with an index greater that 99, it
will corrupt c.
| Array
Declaration |
Data1 a; // Corrupted when b is indexed with 0xFFFFFFFF (-1)
int b[100]; // Declaration of b. Keep in mind that array indexing is a signed operation
Data2 c; // Corrupted when index into b is greater than 99
|
Un-initialized pointer operations are also a big reason for crashes in C and
C++ programs. This problem is so acute that languages like Java and C# do not
permit pointer operations. If a pointer is not initialized before access, this
can result in corrupting pretty much any area of the memory. Sometimes this can
result in hard to detect crashes as the pointer causing memory corruption might
be located in completely unrelated area of the code. Also, un-initialized
pointers can lead to unexpected behavior when the memory map of the application
is modified. This happens if an un-initialized pointer operation was corrupting
a unused memory block. Shifting the memory map or resizing of data structures
might cause the corrupting pointer access to modify used memory. This type of
problems should be suspected when a developer has just changed the size of some
data structure and a stable application starts crashing.
A special case of this problem is invalid access resulting with an attempt to
read or write using a NULL pointer. Here the detection of the problem is very
much hardware dependent. On some platforms, accessing memory for read or write
using in NULL pointer will result in an exception. On other platforms, read
using a NULL pointer might go undetected but a write operation results in a
crash. In yet other architectures, read and write accesses using NULL pointers
might go undetected.
Another special condition is described below. If UpdateTerminalInfo is called
with an un-initialized pointer, there is a possibility that the program does not
crash when status is updated in the structure but it crashes in
UpdateAdditionalInfo when the info variable is updated. This can happen if the
beginning of the structure maps to a valid address but following elements map to
illegal addresses.
| Un-initialized
Pointer Crash |
typedef struct
{
int status;
. . .
int info;
}TerminalInfo;
void UpdateTerminalInfo(TerminalInfo *pTermInfo)
{
pTermInfo->status = INSERVICE;
UpdateAdditionalInfo(pTermInfo);
}
void UpdateAdditionalInfo(TerminalInfo *pTermInfo)
{
pTermInfo->info = TERMINAL_INFO;
}
|
Many times applications free an area of memory but continue to use a pointer
to the memory. This can result in hard to detect crashes as the buffer might
have been reallocated to some other application. This might lead to unexpected behavior
in a different application. Sometimes this might also cause a crash in the
memory management subsystem of the operating system as unauthorized buffer
access might corrupt the heap management data structures.
A special case of unauthorized buffer operations is covered below. Here the
buffer is freed up in the function and an access is attempted to the buffer
after freeing it. This type of problem might go undetected and might even be
harmless on some systems. However in a multithreaded design, the buffer might
have already been allocated to a different thread!
| Unauthorized
Buffer Operation |
void foo(Data1 *buf)
{
// buf is freed in this line
free(buf);
// An access is attempted to buf even after it has been freed up.
// This might cause a problem if the thread got descheduled between
// the free statement and unauthorized buffer operation. The buffer
// might have already been allocated to a different thread!
buf->x = NULL;
}
|
Illegal stack operations can lead to hard to detect crashes. This typically
takes place when a program passes a pointer of the wrong type to a function. The
example given below shows a case of a function expecting an integer pointer and
the caller passes a pointer to a character.
| char
pointer/int pointer mixup |
main()
{
char count;
// The routine expects a int pointer but a char pointer has been passed
// Older compilers and non ANSI C compilers do not catch this error
GetCount(&count);
// The called function was expecting an int (say 4 byte) variable. It was
// however passed a char pointer with one byte space. GetCount will still
// write four bytes, thus corrupting local variables or parameters on the
// stack
}
bool GetCount(int *pCount)
{
. . .
*pCount = returnValue;
return true;
}
|
Processors detect various exception conditions and abort program execution
when they detect an error condition. A few of these conditions are:
- Divide by zero attempted by application
- Program running in user mode attempted to execute an instruction that can
only be executed in supervisor (kernel) mode.
- Program attempted access to an illegal address. The address might be out
of range or the program might not have the privilege to perform the access.
For example, a program attempting to write to read only segment will result
in an exception.
- Misaligned access to memory also results in an exception. Most modern
processors restrict long word reads to addresses divisible by 4. An
exception will be raised if a long word operation is attempted at an address
that is not divisible by 4. (See the byte
alignment and ordering article for details)
When a program enters an infinite loop, it might crash due to invalid array
indexing when the loop index exceeds the array bounds and corrupts memory. In
other scenarios, the program continues to loop until a watchdog kicks in and
aborts the program. If watchdog functionality is not supported, the system will
"hang" and never recover from the error. Thus all embedded systems
must be designed to support watchdog reset functionality.
See the article on fault handling
techniques for more details about watchdog handling.
|