Page 89 - Mobile Computing
P. 89
88
Enhanced fault tolerance: a disk fails, and mirrored disks take over for it
immediately. This provides functionality despite partial system failure, or
graceful degradation, rather than an immediate breakdown and loss of
function.
High level fault tolerant computing: multiple processors collaborate to scan
data and output to detect errors, and then immediately correct them. Fault
tolerance software may be part of the OS interface, allowing the
programmer to check critical data at specific points during a transaction.
Fault-tolerant systems ensure no break in service by using backup
components that take the place of failed components automatically. These
may include:
Hardware systems with identical or equivalent backup operating systems.
For example, a server with an identical fault tolerant server mirroring all
operations in backup, running in parallel, is fault tolerant. By eliminating
single points of failure, hardware fault tolerance in the form of redundancy
can make any component or system far safer and more reliable.
Software systems backed up by other instances of software. For example,
if you replicate your customer database continuously, operations in the
primary database can be automatically redirected to the second database if
the first goes down.
Redundant power sources can help avoid a system fault if alternative
sources can take over automatically during power failures, ensuring no loss
of service.
High Availability vs Fault Tolerance
Highly available systems are designed to minimize downtime to avoid loss of
service. Expressed as a percentage of total running time in terms of a system’s
uptime, 99.999 percent uptime is the ultimate goal of high availability.
Although both high availability and fault tolerance reference a system’s total
uptime and functionality over time, there are important differences and both
strategies are often necessary. For example, a totally mirrored system is fault-
tolerant; if one mirror fails, the other kicks in and the system keeps working with
no downtime at all. However, that’s an expensive and sometimes unwieldy
solution.