Page 190 - Six Sigma Advanced Tools for Black Belts and Master Black Belts
P. 190
OTE/SPH
OTE/SPH
2:58
Char Count= 0
JWBK119-12
August 31, 2006
Contingency Table Approach 175
In order to compensate for scaling effects, a more appropriate measure for comparison
of residuals is given by the adjusted residuals,
n ij − ˆμ ij
. (12.7)
R adj =
ˆ μ ij (1 − p i+ )(1 − p + j )
2
2
For the test of independence between two variables, X and G statistics are suf-
ficient for nominal data. However, if ordinal information is available, analysis based
2
2
on the X and G statistics may not be as sensitive as a test which takes into account
such information. This ordinal information can be derived from a natural ordering
of the levels of the variables. When the association has a positive or negative trend
over the range of ordered categories, tests which leverage on the obvious presence of
such ordinal information are more sensitive to departures from the null hypothesis.
A statistic that encapsulates the ordinal information is given by
2
2
M = (n − 1)r , (12.8)
wherer is the Pearson product-moment correlation between X and Y, and n is the total
sample size. The null hypothesis using this statistic is that of independence between
variables X and Y, and the alternative hypothesis states the presence of significant
2
correlations between these two variables. In this test, the M statistic follows a null χ 2
distribution with 1 degree of freedom. r, which accounts for the ordinal information
underlying the categories, can be calculated as follows:
u i v j n ij − ( u i n i+ ) n
i, j i j v j n + j
, (12.9)
r =
2
2
( j v j n + j)
2 i u i n i+ ) 2
(
j
i
i u n i+ − n j v n + j − n
where the u i are the scores of the ith rows, with u 1 ≤ u 2 ≤ u 3 ≤ ... ≤ u I , and the v i
are the scores of the jth columns, v 1 ≤ v 2 ≤ v 3 ≤ ... ≤ v J .
From equation (12.9), the frequency counts can be observed to be weighted by the
scores of the respective rows and columns. For most data sets, the choice of scores
has little effect on the result if they are reasonably well chosen and equally spaced.
However, in some cases, the imbalance in frequency counts over the categories may
give different results for different scores. In such cases, sensitivity analysis can be
conducted to assess these differences for different scoring system. Other approaches
1
suggested in literature include the use of data to assign scores automatically. How-
ever, such automatic scoring systems may not be appropriate for all circumstances. It
is usually better to leverage on reasonable domain knowledge in selecting scores that
reflect the differences between categories.
A two-way I × J contingency table can be generalized to a three-way I × J × K
contingency table and even multi-way contingency tables. An example of an I × J × K
three-way contingency table is shown in Table 12.2. In this chapter, only two-way
contingency tables for both nominal and ordinal categorical data are dealt with. For
multi-way contingency tables involving more variables, the reader is advised to refer
to Agresti. 1