Page 460 - Handbook of Modern Telecommunications
P. 460

Network Management and Administration                                     3-251



                                    OSS/BSS Process & Technology Integration
                                                Assurance
                                         Product lifecycle management
                                            Service offer monitoring

                                             Service monitoring
                         Fulfillment  Incident &  Service quality management    Usage

                                                  Consolidated operations
                                  problem
                                 management
                                            Domain management

                                   Test &         Fault      Performance
                                  diagnostics  management    management

                     Assurance functionality
            FIGu RE 3.10.5  Assurance functions.
            that a particular domain is performing as expected. Domains are typically structured by the technology
            being managed, but may also be separated organizationally or for regulatory reasons.
              The most fundamental aspect of resource management is knowing whether the various infrastruc-
            ture components are working or not working. This is the role of fault management. It provides for the
            collection and correlation of alarms and other relevant events to provide an accurate view of the health
            of the infrastructure.
              The lack of faults does not necessarily mean that the infrastructure is running properly. Though the
            infrastructure may be functioning, it may not be performing; the load on it may be such that it is being
            asked to do more than it can. This is where performance management comes in.
              Performance management involves the collection and analysis of performance data. The data can be
            collected from performance counters in the equipment or in element managers. It may also be collected
            from instrumentation added in the form of probes. These probes could be passive (monitoring activi-
            ties and taking measurements) or active (simulating a demand for service and measuring the pertinent
            response times).
              Real-time performance management monitors the collected data, making sure it is within prespeci-
            fied parameters. Whenever the data crosses the predefined thresholds, an event is generated to the fault
            management system. In addition to the real-time nature of performance management, the data is also
            used to identify trends and create forecasts.
              In addition to monitoring the state and performance of the infrastructure, it is also important to be
            able to run tests and diagnostics, both to get further information that can be used to better understand
            faults or poor performance as well as to verify that all the components respond as they should. This
            verification process can be a desirable last step of a provisioning process.
              Finally, incident and problem management supports the resolution of any incidents and problems.
            We use the ITIL terminology here rather than a more traditional telecom terminology since it brings
            additional clarity. Incidents are the events that cause or may cause a disruption in a service or its quality.
            A problem is the root cause behind one or more actual or potential incidents. For example, an incident
            could be “excessive retransmissions on a link”; the problem could be “failing board” or “heavy rain
            causes microwave signal degradation.” The goal of incident management is to resolve the incident as
            quickly as possible. The goal of problem management is to resolve the problem so that it no longer exists
            or, if this is not possible, capture the necessary knowledge so that any incidents that do occur can be
            identified and resolved faster.
   455   456   457   458   459   460   461   462   463   464   465