|
Home >
Real-time Mantra >
Fault Handling > Hardware Fault Tolerance
|
| Network Load Balancing |
| Network load balancing is a different flavor of load sharing where there is no higher level processor to perform load distribution. Instead, the load distribution is achieved by hashing on the source address bits. For example, many high traffic websites perform load sharing by broadcasting the HTTP Get request over the Ethernet to all the load sharing machines. The network card on the load sharing machines are appropriately configured to pass a certain portion of the HTTP Get requests to the main computer. The remaining requests are filtered out as they will be handled by other machines. If one of the load sharing machine fails, filter settings on all the active machines are appropriately modified to redistribute the traffic. |
For redundancy to work, the standby unit needs to be kept synchronized with the active unit at all times. This is required so that the standby can fit into the active's boots in case the active fails. The standby synchronization can be achieved in the following ways:
In this scheme the active and the standby are locked at processor bus cycle level. To keep itself synchronized with the active unit, the standby unit watches each processor instruction that is performed by active. Then, it performs the same instruction in the next bus cycle and compares the output with that of the active unit. If the output does not match, the standby might takeover and become active. The main disadvantage here is that specialized hardware is needed to implement this scheme. Also, bus cycle level synchronization introduces wait states in bus cycle execution. This will lower the overall performance of the processor.
Here, the system is configured with two CPUs and two parity based memory cards. One of the CPU is active and the other is standby. Both the memory cards are driven by the active CPU. No memory is attached to the standby unit. Each memory write by the active is made to both the memory cards. The data bits and the parity bits are updated individually on both the memory cards. On every memory read, the output of both the memory cards is compared. If a mismatch is detected, the processor believes the memory card with correct parity bit. The other memory card is marked suspected and a fault trigger is generated.
The standby unit continuously monitors the health of the active unit by sanity punching or watchdog mechanism. If a fault is detected, the standby takes over both the memory cards. Since the application context is kept in memory, the new active processor gets the application context.
The main disadvantage here is that specialized hardware is needed to implement this scheme. Also, memory mirroring introduces wait states in bus cycle execution. This will lower the overall performance of the processor.
In this scheme, active unit passes all the messages received from external sources to the standby. The standby performs all the actions as though it were active with the difference that no output is sent to the external world. The main advantage here is that no special hardware is required to implement this. The scheme is practical only in conditions where the processor is required to take fairly simple decisions. In cases of complex decisions, the synchronization can be easily lost if the two processor take different decisions on the same input message.
To some extent, this one is like message level synchronization as active conveys synchronization information in terms of messages to standby. The difference is that all the external world messages are not conveyed. The information is conveyed only about predefined milestones. For example, in a Call Processing system, checkpoints may be passed only when the call reaches conversation or is cleared. If standby takes over, all the calls in conversation would be retained whereas all calls in transient states will be lost. Resource information for the transient calls may be retrieved by running software audits with other modules. This scheme is not prone to loss of synchronization under normal conditions. Also, the message traffic to the standby is reduced, thus improving the overall performance of the active.
In this scheme, no synchronization between the active and the standby. When the standby takes over, it recovers the processor context by requesting information with other modules in the system. The advantage of this scheme lies in its simplicity of implementation. Also, there is no performance overhead due to mate synchronization. The disadvantage is that the standby take over may be delayed due to reconciliation requirements.
|