Share

Debugging Software Crashes

Debugging software crashes is one of the most difficult parts of real-time and embedded software development. Software crashes when an application performs an illegal operation and the operating system is forced to abort the execution of the application. Here we will discuss several causes of crash in typical embedded application. A good understanding of C to assembly would be helpful in understanding the content described here.

The following software problems lead to crashes:

Invalid Array Indexing

Invalid array indexing is one of the biggest source of crashes in C and C++ programs. Both the languages do not support array bound checking, thus invalid array indexing usually goes undetected during testing. Out of bound array indexing will corrupt data structures that allocated memory after the array. Another point often missed in analyzing  array indexing problems is the fact that invalid array indexing can corrupt data structures declared before the array. This happens when the array is indexed with a very large unsigned number that represents a negative number in signed arithmetic. Consider an array b which is accidentally indexed with the number 0xFFFFFFFF, Since array index is considered to be a signed integer, this access will be treated as an access to -1 index. Thus this access will corrupt variables declared before the array, i.e. memory allocated to a. If the array is indexed with an index greater that 99, it will corrupt c.

Array declaration

Data1 a;     // Corrupted when b is indexed with 0xFFFFFFFF (-1)
int b[100];  // Declaration of b. Keep in mind that array indexing is a signed operation
Data2 c;     // Corrupted when index into b is greater than 99    

Un-initialized Pointer Operations

Un-initialized pointer operations are also a big reason for crashes in C and C++ programs. This problem is so acute that languages like Java and C# do not permit pointer operations. If a pointer is not initialized before access, this can result in corrupting pretty much any area of the memory. Sometimes this can result in hard to detect crashes as the pointer causing memory corruption might be located in completely unrelated area of the code. Also, un-initialized pointers can lead to unexpected behavior when the memory map of the application is modified. This happens if an un-initialized pointer operation was corrupting a unused memory block. Shifting the memory map or resizing of data structures might cause the corrupting pointer access to modify used memory. This type of problems should be suspected when a developer has just changed the size of some data structure and a stable application starts crashing.

A special case of this problem is invalid access resulting with an attempt to read or write using a NULL pointer. Here the detection of the problem is very much hardware dependent. On some platforms, accessing memory for read or write using in NULL pointer will result in an exception. On other platforms, read using a NULL pointer might go undetected but a write operation results in a crash. In yet other architectures, read and write accesses using NULL pointers might go undetected.

Another special condition is described below. If UpdateTerminalInfo is called with an un-initialized pointer, there is a possibility that the program does not crash when status is updated in the structure but it crashes in UpdateAdditionalInfo when the info variable is updated. This can happen if the beginning of the structure maps to a valid address but following elements map to illegal addresses.

Uninitialized pointer

typedef struct 
{
    int status;
    . . .
    
    int info;
}TerminalInfo;

void UpdateTerminalInfo(TerminalInfo *pTermInfo)
{
    pTermInfo->status = INSERVICE;
    UpdateAdditionalInfo(pTermInfo);
}
      
void UpdateAdditionalInfo(TerminalInfo *pTermInfo)
{
    pTermInfo->info = TERMINAL_INFO;
}          

Unauthorized Buffer Operations

Many times applications free an area of memory but continue to use a pointer to the memory. This can result in hard to detect crashes as the buffer might have been reallocated to some other application. This might lead to unexpected behavior in a different application. Sometimes this might also cause a crash in the memory management subsystem of the operating system as unauthorized buffer access might corrupt the heap management data structures.

A special case of unauthorized buffer operations is covered below. Here the buffer is freed up in the function and an access is attempted to the buffer after freeing it. This type of problem might go undetected and might even be harmless on some systems. However in a multithreaded design, the buffer might have already been allocated to a different thread!

Unauthorized buffer operation

void foo(Data1 *buf)
{
   // buf is freed in this line
   free(buf);
   
   // An access is attempted to buf even after it has been freed up.
   // This might cause a problem if the thread got descheduled between
   // the free statement and unauthorized buffer operation. The buffer
   // might have already been allocated to a different thread!
   buf->x = NULL;    
} 

Illegal Stack Operations

Illegal stack operations can lead to hard to detect crashes. This typically takes place when a program passes a pointer of the wrong type to a function. The example given below shows a case of a function expecting an integer pointer and the caller passes a pointer to a character.

char pointer/int pointer mixup

int main()
{
   char count;
   // The routine expects a int pointer but a char pointer has been passed
   // Older compilers and non ANSI C compilers do not catch this error
   GetCount(&count);
   // The called function was expecting an int (say 4 byte) variable. It was
   // however passed a char pointer with one byte space. GetCount will still
   // write four bytes, thus corrupting local variables or parameters on the
   // stack
}

bool GetCount(int *pCount)
{
  . . .
  *pCount = returnValue;
  return true;
} 

Invalid Processor Operations

Processors detect various exception conditions and abort program execution when they detect an error condition. A few of these conditions are:

  • Divide by zero attempted by application
  • Program running in user mode attempted to execute an instruction that can only be executed in supervisor (kernel) mode.
  • Program attempted access to an illegal address. The address might be out of range or the program might not have the privilege to perform the access. For example, a program attempting to write to read only segment will result in an exception.
  • Misaligned access to memory also results in an exception. Most modern processors restrict long word reads to addresses divisible by 4. An exception will be raised if a long word operation is attempted at an address that is not divisible by 4. (See the byte alignment and ordering article for details)

Infinite Loop

When a program enters an infinite loop, it might crash due to invalid array indexing when the loop index exceeds the array bounds and corrupts memory. In other scenarios, the program continues to loop until a watchdog kicks in and aborts the program. If watchdog functionality is not supported, the system will "hang" and never recover from the error. Thus all embedded systems must be designed to support watchdog reset functionality.

See the article on fault handling techniques for more details about watchdog handling.