Debugging Software Crashes
Debugging software crashes is one of the most difficult parts of real-time and embedded software development. Software crashes when an application performs an illegal operation and the operating system is forced to abort the execution of the application. Here we will discuss several causes of crash in typical embedded application. A good understanding of C to assembly would be helpful in understanding the content described here.
The following software problems lead to crashes:
Invalid Array Indexing
Invalid array indexing is one of the biggest source of crashes in C and C++ programs. Both the languages do not support array bound checking, thus invalid array indexing usually goes undetected during testing. Out of bound array indexing will corrupt data structures that allocated memory after the array. Another point often missed in analyzing array indexing problems is the fact that invalid array indexing can corrupt data structures declared before the array. This happens when the array is indexed with a very large unsigned number that represents a negative number in signed arithmetic. Consider an array b which is accidentally indexed with the number 0xFFFFFFFF, Since array index is considered to be a signed integer, this access will be treated as an access to -1 index. Thus this access will corrupt variables declared before the array, i.e. memory allocated to a. If the array is indexed with an index greater that 99, it will corrupt c.
Un-initialized Pointer Operations
Un-initialized pointer operations are also a big reason for crashes in C and C++ programs. This problem is so acute that languages like Java and C# do not permit pointer operations. If a pointer is not initialized before access, this can result in corrupting pretty much any area of the memory. Sometimes this can result in hard to detect crashes as the pointer causing memory corruption might be located in completely unrelated area of the code. Also, un-initialized pointers can lead to unexpected behavior when the memory map of the application is modified. This happens if an un-initialized pointer operation was corrupting a unused memory block. Shifting the memory map or resizing of data structures might cause the corrupting pointer access to modify used memory. This type of problems should be suspected when a developer has just changed the size of some data structure and a stable application starts crashing.
A special case of this problem is invalid access resulting with an attempt to read or write using a NULL pointer. Here the detection of the problem is very much hardware dependent. On some platforms, accessing memory for read or write using in NULL pointer will result in an exception. On other platforms, read using a NULL pointer might go undetected but a write operation results in a crash. In yet other architectures, read and write accesses using NULL pointers might go undetected.
Another special condition is described below. If UpdateTerminalInfo is called with an un-initialized pointer, there is a possibility that the program does not crash when status is updated in the structure but it crashes in UpdateAdditionalInfo when the info variable is updated. This can happen if the beginning of the structure maps to a valid address but following elements map to illegal addresses.
Unauthorized Buffer Operations
Many times applications free an area of memory but continue to use a pointer to the memory. This can result in hard to detect crashes as the buffer might have been reallocated to some other application. This might lead to unexpected behavior in a different application. Sometimes this might also cause a crash in the memory management subsystem of the operating system as unauthorized buffer access might corrupt the heap management data structures.
A special case of unauthorized buffer operations is covered below. Here the buffer is freed up in the function and an access is attempted to the buffer after freeing it. This type of problem might go undetected and might even be harmless on some systems. However in a multithreaded design, the buffer might have already been allocated to a different thread!
Illegal Stack Operations
Illegal stack operations can lead to hard to detect crashes. This typically takes place when a program passes a pointer of the wrong type to a function. The example given below shows a case of a function expecting an integer pointer and the caller passes a pointer to a character.
Invalid Processor Operations
Processors detect various exception conditions and abort program execution when they detect an error condition. A few of these conditions are:
- Divide by zero attempted by application
- Program running in user mode attempted to execute an instruction that can only be executed in supervisor (kernel) mode.
- Program attempted access to an illegal address. The address might be out of range or the program might not have the privilege to perform the access. For example, a program attempting to write to read only segment will result in an exception.
- Misaligned access to memory also results in an exception. Most modern processors restrict long word reads to addresses divisible by 4. An exception will be raised if a long word operation is attempted at an address that is not divisible by 4. (See the byte alignment and ordering article for details)
When a program enters an infinite loop, it might crash due to invalid array indexing when the loop index exceeds the array bounds and corrupts memory. In other scenarios, the program continues to loop until a watchdog kicks in and aborts the program. If watchdog functionality is not supported, the system will "hang" and never recover from the error. Thus all embedded systems must be designed to support watchdog reset functionality.
See the article on fault handling techniques for more details about watchdog handling.