Page 188 - Six Sigma Advanced Tools for Black Belts and Master Black Belts
P. 188
OTE/SPH
OTE/SPH
Char Count= 0
2:58
JWBK119-12
August 31, 2006
Contingency Table Approach 173
one member of this family of techniques that is particularly prevalent and effective
is logistic regression. This is discussed for cases of single and multiple categorical or
continuous explanatory variables in Section 12.4.
12.2 CONTINGENCY TABLE APPROACH
Categorical data can typically be presented in a tabular format when both the response
and explanatory variables are categorical in nature, or can be defined in distinct cate-
gories. Variables which are categorical in nature are commonly referred to as factors,
and the different categories commonly referred to as factor levels. In many situations,
the data in contingency tables are the frequency counts of observations occurring for
each possible factor-level combination.
Table 12.1 shows a typical two-way contingency table for a simple situation with
only two categorical variables, Xand Y, with I and J levels, respectively. Each variable
n ij (i = 1,..., I; j = 1, . . . , J) in the table shows the frequency of counts in each (i, j)
factor-level combination. For each row (column) the marginal sums are shown in the
‘Total’ row (column). The total sample size is denoted by n.
A number of statistical measures and procedures have been proposed to assess the
association or relationship between variables in categorical data analysis. Statistical
measures such as sample proportions, relative risks and odds ratios can be used in
the case of binary variables in two-way contingency tables (see the case study in
Section 12.3).
Another key method is to use a rigorous statistical hypothesis test. Let π ij denote
the probability of an observation belonging to category X = i and Y = j. The π ij
thus define the joint probability distribution of X and Y. Denote by π i+ the marginal
probability of an observation belonging to category X = i and by π + j the marginal
probability of an observation belonging to category Y = j. A typical hypothesis test
for a two-way contingency table with only one response and one explanatory variable
is as follows:
H 0 : π ij = π i+ π + j vs. H 1 : π ij = π i+ π + j , for all i and j.
The null hypothesis H 0 states that the variables X and Y are statistically indepen-
dent. When this holds the probability of an observation falling in any particular col-
umn is independent of which row that observation belongs to. This results in the
Table 12.1 Two-way contingency table.
Y
Level 1 Level 2 . . . Level J Total
X Level 1 n 11 n 12 ... n 1J n 1+
Level 2 n 21 n 22 ... n 2J n 2+
: : : : :
Level I n I1 n I2 ... n IJ n I+
Total n +1 n +2 n +J n