Page 170 - Six Sigma Advanced Tools for Black Belts and Master Black Belts
P. 170

OTE/SPH
 OTE/SPH
                               Char Count= 0
         August 31, 2006
 JWBK119-11
                         2:57
                                 Pearson Chi-Square Test                     155
                        11.3  PEARSON CHI-SQUARE TEST
                             2
      The Pearson chi-square (χ ) GOF test belongs to a generic family of GOF tests which
                             P
                      2
                                                2
      are based on the X test statistic. The classical X statistic is essentially given by  4,5
              k         2
          2     (O i − E i )
        X =              ,                                                  (11.1)
                    E i
             i=1
      where E i is the expected frequency of sample observations in class i given that the
      frequency of samples in each class follows the hypothesized distribution, and O i is
      the observed frequency of sample observations in class i. This test requires the classi-
      fication of random sample observations from a population into k mutually exclusive
      and exhaustive classes where the number of sample observations in each class is suf-
      ficiently large for the expected frequencies (based on some postulated distribution)
      in each category to be non-trivial. Given that the sample size is n, E i is given by

        E i = np i,0 ,

      where p i,0 is the probability that a sample observation belongs to class i under the
      null hypothesis
                                                                   2
        When the null hypothesis holds, the limiting distribution of the X statistic is χ 2
      distributed with (k − c − 1) degrees of freedom (where k is the number of non-empty
      cells and c is the number of parameters to be estimated). 5,6  Given existing proofs based
                                                                 2
      on asymptotic characteristics, it should be noted that the limiting χ distribution is a
                                             2
      poor approximation to the distribution of X when the number of samples is small.
                              2
      The lack of sensitivity in χ GOF tests with few observations has been frequently
      acknowledged. 7
                     2
        The Pearson χ GOF test is perhaps one of the most commonly used tests due to
      its versatility. It can easily be applied to test any distributional assumption of discrete
      or continuous type for any univariate data set without having to know the value
      of the distributional parameters. The main disadvantage associated with this GOF
      test is its lack of sensitivity in detecting inadequate models when few observations
                                                                     2
      are available as sufficiently large sample sizes are required for the χ assumption
      to be valid. Furthermore, there is a need to rearrange the data into arbitrary cells.
                                                                         2
      Such grouping of data can affect the outcome of tests as the value of the X statistic
      depends on how data is grouped. Nonetheless, most reasonable choices for grouping
      the data should produce similar results. Some rule-of-thumb criteria are discussed in
      the following worked example.
        Table 11.1 lists residuals derived from the flight times of model helicopters con-
      structed during a helicopter experiment conducted in a design of experiments expe-
      riential learning segment of one of a Design for Six Sigma course conducted by the
                                                                8
      authors. Such helicopter experiments are described in Box and Liu. Residuals are the
      differences between the sample observations and the fitted response values obtained
      from the prediction model. The prediction model is a multiple linear regression model
      that was established through a factorial experiment. Each residual (e i ) thus describes
   165   166   167   168   169   170   171   172   173   174   175