Page 171 - Six Sigma Advanced Tools for Black Belts and Master Black Belts
P. 171

OTE/SPH
 OTE/SPH
         August 31, 2006
                               Char Count= 0
 JWBK119-11
        156              2:57  Goodness-of-Fit Tests for Normality
                   Table 11.1 Residuals from prediction model of flight times.
                   0.06  −0.28   0.54   0.29  −0.33   0.05  −0.19   0.55
                   0.39  −0.47  −0.39   0.69  −0.19  −0.62  −0.07  −0.58
                   0.29   0.30   0.15   0.48  −0.08  −0.33   0.24  −0.52
                   0.32  −0.49   0.07   0.50  −0.04   0.33   0.18  −0.20
                   0.14  −0.24   0.19  −0.19   0.03  −0.14   0.07  −0.57



                                              ˆ
        the error in the fit of the prediction model (Y i ) to the ith observation (Y i ) and is given
        by
                   ˆ
          e i = Y i − Y i .
          In performing the regression analysis the usual assumptions that the errors are
        independent and follow a normal distribution with zero mean and constant variance
        are made. Hence, if the prediction model adequately describes the behavior of the
        actual system, its residuals should not exhibit tendencies towards non-normality.
        A GOF test for normality is usually conducted on these residuals to determine the
        adequacy of the prediction model.
          The population mean, μ, for this test is 0 and the standard deviation , σ, is assumed
        to be unknown and has to be estimated from the data. For theoretical consistency,
        these parameters should be estimated using the maximum likelihood method. Under
                                          ¯
        the null hypothesis, the sample mean, X , is a maximum likelihood estimator for the
        population mean. The population standard deviation for a sample of size n can be
        estimated from
                          2    1/2
                n       ¯
                   (X i − X)
          ˆ σ =
                      n
               i=1
        where X i represents observation i. This is the maximum likelihood estimator for the
        population standard deviation given the null hypothesis that the data comes from a
        normal distribution.
          The data is then ordered and grouped into k classes. The determination of class
        groupings for sample data sets has long been a subject of contention. A relatively ro-
        bust rule-of-thumb method frequently suggested is for the expected frequency within
        each class to be at least 5. However, it should be noted that there has been no general
        agreement regarding this minimum expected frequency. If the expected frequency is
        too small, adjacent cells can be combined.
          For a discrete distribution such as Poisson or binomial, data can be naturally as-
        signed into discrete classes with class boundaries clearly defined based on this rule
        of thumb. The classes may be set up to ensure that the number of classes should not
        exceed n/5. This is to ensure that the number of observations in each class is not less
        than 5. The class boundaries can be determined in an equiprobability manner where
        the probability of a random observation falling into each class is equal and estimated
        to be 1/k for k classes as follows:
                      1             2                  k − 1
          P (x ≤ x 1 ) =  , P (x ≤ x 2 ) =  ,..., P (x ≤ x k−1 ) =  .
                      k            k                     k
   166   167   168   169   170   171   172   173   174   175   176