Page 190 - Six Sigma Advanced Tools for Black Belts and Master Black Belts
P. 190

OTE/SPH
 OTE/SPH
                         2:58
                               Char Count= 0
 JWBK119-12
          August 31, 2006
                                Contingency Table Approach                   175
      In order to compensate for scaling effects, a more appropriate measure for comparison
      of residuals is given by the adjusted residuals,
                     n ij − ˆμ ij
                                  .                                        (12.7)
        R adj =
                ˆ μ ij (1 − p i+ )(1 − p + j )
                                                                2
                                                         2
        For the test of independence between two variables, X and G statistics are suf-
      ficient for nominal data. However, if ordinal information is available, analysis based
              2
                    2
      on the X and G statistics may not be as sensitive as a test which takes into account
      such information. This ordinal information can be derived from a natural ordering
      of the levels of the variables. When the association has a positive or negative trend
      over the range of ordered categories, tests which leverage on the obvious presence of
      such ordinal information are more sensitive to departures from the null hypothesis.
      A statistic that encapsulates the ordinal information is given by
          2
                    2
        M = (n − 1)r ,                                                     (12.8)
      wherer is the Pearson product-moment correlation between X and Y, and n is the total
      sample size. The null hypothesis using this statistic is that of independence between
      variables X and Y, and the alternative hypothesis states the presence of significant
                                                         2
      correlations between these two variables. In this test, the M statistic follows a null χ 2
      distribution with 1 degree of freedom. r, which accounts for the ordinal information
      underlying the categories, can be calculated as follows:

                    u i v j n ij − (  u i n i+ )  n

                  i, j         i         j  v j n + j
                                                      ,                    (12.9)
        r =
                                                    2
             	                 2  
          (   j v j n + j)
                   2       i u i n i+ )  2
                         (
                                       j
                   i
                i  u n i+ −  n       j  v n + j −  n
      where the u i are the scores of the ith rows, with u 1 ≤ u 2 ≤ u 3 ≤ ... ≤ u I , and the v i
      are the scores of the jth columns, v 1 ≤ v 2 ≤ v 3 ≤ ... ≤ v J .
        From equation (12.9), the frequency counts can be observed to be weighted by the
      scores of the respective rows and columns. For most data sets, the choice of scores
      has little effect on the result if they are reasonably well chosen and equally spaced.
      However, in some cases, the imbalance in frequency counts over the categories may
      give different results for different scores. In such cases, sensitivity analysis can be
      conducted to assess these differences for different scoring system. Other approaches
                                                                          1
      suggested in literature include the use of data to assign scores automatically. How-
      ever, such automatic scoring systems may not be appropriate for all circumstances. It
      is usually better to leverage on reasonable domain knowledge in selecting scores that
      reflect the differences between categories.
        A two-way I × J contingency table can be generalized to a three-way I × J × K
      contingency table and even multi-way contingency tables. An example of an I × J × K
      three-way contingency table is shown in Table 12.2. In this chapter, only two-way
      contingency tables for both nominal and ordinal categorical data are dealt with. For
      multi-way contingency tables involving more variables, the reader is advised to refer
      to Agresti. 1
   185   186   187   188   189   190   191   192   193   194   195