Page 88 - Mobile Computing
P. 88

87


               Fault Tolerance


               Fault  Tolerance  simply  means  a  system’s  ability  to  continue  operating
               uninterrupted despite the failure of one or more of its components. This is true
               whether it is a computer system, a cloud cluster, a network, or something else. In
               other words, fault tolerance refers to how an operating system (OS) responds to
               and allows for software or hardware malfunctions and failures.

               An OS’s ability to recover and tolerate faults without failing can be handled by
               hardware, software, or a combined solution leveraging load balancers (see more

               below). Some computer systems use multiple duplicate fault tolerant systems to
               handle faults gracefully. This is called a fault tolerant network.


                                                                                         Web Server

                          Web Application







                                                        Load Balancer





                                                                                      Standby Web Server








               What is Fault Tolerance?

               The goal of fault tolerant computer systems is to ensure business continuity and
               high availability by preventing disruptions arising from a single point of failure.
               Fault  tolerance  solutions  therefore  tend  to  focus  most  on  mission-critical
               applications or systems.

               Fault tolerant computing may include several levels of tolerance:


                     At the lowest level, the ability to respond to a power failure, for example.

                     A  step  up:  during  a  system  failure, the  ability  to  use  a backup  system
                       immediately.
   83   84   85   86   87   88   89   90   91   92   93