Page 173 - Six Sigma Advanced Tools for Black Belts and Master Black Belts
P. 173

OTE/SPH
 OTE/SPH
         August 31, 2006
                               Char Count= 0
 JWBK119-11
        158              2:57  Goodness-of-Fit Tests for Normality
        as follows:
                 ⎪ 0,    x < x (1) ,
                 ⎧
                   r
                 ⎪
                 ⎨
          F n (x) =  ,    x (r) ≤ x ≤ x (r+1) ,
                 ⎪ n
                 ⎪
                   1,    x (n) ≤ x.
                 ⎩
        The development of EDF based approaches initially focused on continuous data.
        Subsequently, several modifications were developed for discrete and grouped data.
          EDF based statistics can be broadly classified into two types, based on how the
                                                                             9
        deviation between the EDF (F n ) and the hypothesized CDF (F 0 ) is measured. The
        ‘supremum’class of statistics essentially computes the maximum deviations between
        F n and F 0 . This class includes the well-known Kolmogorov--Smirnov D statistic. The
        ‘quadratic’ class computes the following measure of deviation between F n and F 0 :
                  ∞

                                2
            2
          Q = n    [F n (x) − F 0 (x)] ψ(x)dF 0 (x),                          (11.3)
                −∞
        where n is the sample size and ψ(x) is a weighting function for the deviations
                     2
        [F n (x) − F 0 (x)] . This class includes the Cram´er--von Mises family of statistics which
        encompass GOF tests that utilize statistics such as the Cram´er--von Mises statistic, the
        Anderson--Darling statistic and the Watson statistic. Here, the more popular Cram´er-
        -von Mises and Anderson--Darling statistics are discussed. Details of the Watson
                                      2
        statistic (commonly denoted by U ), useful for some special cases such as points on a
                                  10
        circle, can be found elsewhere. The essential difference between statistics in this class
        is that the weighting function, ψ(x), is defined differently so as to weight deviations
        according to the importance attached to various portions of the distribution function.
          In this section, the Kolmogorov--Smirnov D statistic is described first followed by
        the Cram´er--von Mises and Anderson--Darling statistics.

        11.4.1 Kolmogorov--Smirnov

        The Kolmogorov--Smirnov test based on the Dstatistic is perhaps the best-known EDF
        based GOF test. The measure of deviation in such a test is essentially the maximum
        absolute difference between F n (x) and F 0 (x): 6,11,12
          D = sup [|F n (x) − F 0 (x)|] .
                x
        Although this expression for D looks problematic, the asymptotic distribution of this
        statistic was established by Kolmogorov, 13  and later also by Feller 12  and Doob. 11
          Due to the structure of this statistic, a confidence band can easily be set up such that
        thetruedistribution, F(x),liesentirelywithinthebandwithaprobabilityof1 − α.This
        can be done as follows. Given any true F(x), if d α is the critical value of D for test size α,

          P {D > d α } = α.
        This probability statement can be inverted to give the confidence statement:

          P F n (x) − d α ≤ F(x) ≤ F n (x) + d α , ∀x = 1 − α.                (11.4)
   168   169   170   171   172   173   174   175   176   177   178