|
This article continues our discussion on debugging
software crashes. Here we focus on memory corruption crash
symptoms. We will also look at the special considerations in debugging C++ code
crashes. Finally we will look at techniques to simplify crash debugging.
Programs store data in any of the following ways:
| Global |
All variables of objects declared as global in a C/C++
program fall into this category. This also includes static variable
declarations. |
| Heap |
Memory allocated using new or malloc is allocated on the
heap. In many systems, stack and heap are allocated from opposite sides of
a memory block. (See the figure below) |
| Stack |
All local variables and function parameters are passed on
the stack. Stack is also used for storing the return address of the
calling functions. Stack also keeps the register contents and return
address when an interrupt service routine is called. |
Memory corruption in the global area, stack or the heap can have confusing
symptoms. These symptoms are explored here.
If a global data location is found to be corrupted, there is good chance that
this is caused by array index overflow from the previous global data
declarations. Also the corruption might have been caused by an array index
underflow (array accessed with a negative index) from the next variable
declarations. The following rules should be helpful in debugging this condition:
- If you have a debugging system which allows you to put breakpoints on data
write to a certain location, use that feature to find the offending program
corrupting the memory. If you don't have the luxury of such a tool, the
following steps might help.
- If the variable is a part of structure, check if overflow/underflow of
previous or next variables in the structure could have caused this
corruption.
- If other structure member access seems harmless, use the linker generated
symbol map to locate other global variables declared in the vicinity of the
corrupted structure. Examine the data structures to determine if they could
have caused the corruption.
- Sometimes looking at the corrupted memory locations can also give a good
idea of the cause of corruption. You might be able to recognize a string or
data pattern identifying the culprit. This might be your only hope if the
corruption is caused by an un-initialized pointer.
- Extent of corruption might also give a clue of the cause of corruption.
Try to determine the starting and ending points of a corruption (only
possible if the corrupting program is writing in an identifiable pattern).
Corruption on the heap can be very hard to detect. A heap corruption could
lead to a crash in heap management primitives that are invoked by memory
management functions like malloc and free. It might be very hard to detect the
original source of corruption as the buffer that lead to corruption of adjacent
buffers might have long been freed. Guidelines for debugging crashes in heap
area are:
- If a crash is observed in memory management primitives of the operating
system, heap corruption is a possibility. It has been observed that memory
buffer corruption sometimes leads to corruption of OS buffer linked list,
causing crashes on OS code.
- If a memory corruption is observed in an allocated buffer, check the
buffers in the vicinity of this buffer to look for source of corruption.
- Corruption of buffers close to heap boundary might be due to stack
overflow or stack overwrite leading to heap corruption (see the above
figure)
- Conversely, stack corruption might take place if a write into the heap
overflows and corrupts the stack area.
Stack corruption by far produces the most varied symptoms. Modern programming
languages use the stack for a large number of operations like maintaining local
variables, function parameter passing, function return address management. See
the article on c to assembly translation
for details.
Here are the rules for debugging stack corruption:
- If a crash is observed when a function returns, this might be due to stack
corruption. The return address on the stack might have been corrupted by
stack operations of called functions.
- Crash after an interrupt service routine returns might also be caused by
stack corruption.
- Stack corruption can also be suspected when a passed parameter seems to
have a value different from the one passed by the calling function.
- When a stack corruption is detected, one should look at the local
variables in the called and calling functions to look for possible sources
of memory corruption. Check array and pointer declarations for sources of
errors.
- Sometimes stray corruption of a processors registers might also be due to
a stack corruption. If a register gets corrupted due to no reason, one
possibility is that an offending thread or program corrupted the register
context on the stack. When the register is restored as a part of a context
switch, the task crashes.
- Corruption in heap can trickle down to the stack.
- Stack overflow takes place when a programs function nesting exceeds the
stack allocated to the program. This can cause a stack area or heap area
corruption. (Depends upon who attempts to access the corrupted memory first,
a heap operation or stack operation).
We have been discussing crash debugging techniques that apply equally well to
C as well as C++. This section covers crash debugging techniques that are
specific to C++.
Many C++ developers get confused by crashes that involve method invocation on
a corrupted pointer. Developers need to realize that invoking a method for an
illegal object pointer is equivalent to passing an illegal pointer to a
function. A crash would result when any member variable is accessed in the
called method.
In the example given below, when HandleMsg() is
invoked for a NULL pX, the crash will result only
when an access is attempted to member variables of X. There will be no problem
in calling PrepareForMessage() or HandleYMsg()
for Y pointer. (For more details on this refer to C
and C++
article.
| Corrupted
Object Pointer Access |
class X
{
int m_x;
public:
void HandleMsg(Y *pY, Msg *pMsg)
{
pY->PrepareForMessage();
pY->HandleYMsg(pMsg);
m_x = pMsg->GetX(); // Crash takes place here
}
};
main()
{
X *pX = NULL;
Y y;
. . .
// pX is still NULL
pX->HandleMsg(&y, pMsg);
}
|
| Inheriting
Classes |
class A
{
int m_a;
int m_array[MAX_ARRAY];
public:
void SetA(int a);
int GetA() const;
virtual void SendCommand() = 0;
};
class B : public A
{
int m_b;
public:
void SetB(int b);
int GetB() const;
void SendCommand(); // Override method
};
|
All classes with virtual functions have a pointer to the V-table
corresponding to overrides for that class. The V-table pointer is generally
stored just after the elements of the base class. Corruption of the v-table
pointer can baffle developers as the real problem often gets hidden by the
symptoms of the crash.
The figure above shows the declaration of class A and
B. The figure below
shows the memory layout for an object of class B. If m_array array is indexed
with an index exceeding its size, the first variable to be corrupted will be the
v-table pointer. This problem will manifest as a crash on invoking method
SendCommand. The reason this happens is that SendCommand is a virtual function,
so the real access will be using a virtual table. If the virtual table pointer
is corrupted, calling this function will take you to never-never land.
| int
m_a |
| int
m_array [MAX_ARRAY] |
| VTable
*vptr |
| int
m_b |
For more details on v-table organization refer to C
and C++ Comparison II article.
Many C++ programs involve a lot of dynamic memory allocation by new. Many C++
crashes can be attributed to not checking for memory allocation failure. In C++
this can be achieved in two ways:
- Handle out of memory exception
- Check for new returning a NULL pointer.
Here are a few simple techniques for simplifying crash debugging:
Make sure that every embedded processor in the system supports dumping of the
stack at the time of crash. The crash dump should be saved in non volatile
memory so that it can be retrieved by tools on processor reboot. In fact attempt
should be made to save as much as possible of processor state and key data
structures at the time of crash.
An ounce of prevention is better than a pound of cure. Detecting crash causing
conditions by using assert macro can be a very useful tool in detecting problems
much before they lead to a crash. Basically assert macros check for a condition
which the function assumes to be true. For example, the code below shows an
assertion which checks that the message to be processed is non NULL. During
initial debugging of the system this assert condition might help you detect the
condition, before it leads to a crash.
Note that asserts do not have any overhead in the shipped system as in the
release builds asserts are defined to a NULL macro, effectively removing all the
assert conditions.
| assert
usage |
void HandleOrigination(const OriginationMsg *pMsg)
{
assert(pMsg);
assert(pMsg->numberOfDigits != 0);
. . .
}
|
Similar to asserts, use of defensive checks can many times save the system
from a crash even when an invalid condition is detected. The main difference
here is that unlike asserts, defensive checks remain in the shipped system. Tracing
and maintaining event history can also be very useful in debugging crashes in
the early phase of development. However tracing of limited use in debugging
systems when the system has been shipped. |