Page 191 - Six Sigma Advanced Tools for Black Belts and Master Black Belts
P. 191
OTE/SPH
OTE/SPH
August 31, 2006
JWBK119-12
Introduction to the Analysis of Categorical Data
176 2:58 Char Count= 0
Table 12.2 Three-way contingency table.
Y
Z X Level 1 Level 2 . . . Level J
Levels Levels
1 1 n 111 n 112 ... n 11J
2 n 121 n 122 ... n 12J
. . . .
. . . .
. . . .
I 1 n 1I1 n 1I2 ... n 1IJ
.
.
. 1 n i11 n i12 ... n i1J
2 n i21 n i22 ... n i2J
. . . .
. . . .
. . . .
...
I i n iI1 n iI2 n iIJ
K 1 n K11 n K12 ... n K1J
2 n K21 n K22 ... n K2J
. . . .
. . . .
. . . .
I K n KI 1 n KI 2 ... n KIJ
12.3 CASE STUDY
In this section a case study is described to demonstrate the use of contingency table
techniques for categorical data analysis. The case study is based on data obtained
from an annual national survey of entities engaged in research and development. The
actual survey was much broader in scope and sought to obtain a variety of informa-
tion related to the R&D activities of private and public organizations. Data related
to the sectoral employment of research scientists/engineers (RSEs) based on their
qualification level is extracted in order to assess whether having a PhD is associated
with a better chance of employment in the private sector. The actual survey is not
specifically targeted to achieve this objective. However, the data set is simple enough
for demonstrating the contingency table approach here.
Table 12.3 cross-classifies 18 935 survey respondents by qualifications and employ-
ment sector. Here, the response variable is the number of RSEs employed in the private
sector and the explanatory variable is the qualification level (with or without PhD).
The association we are concerned here with is the conditional distribution of employ-
ment in the private sector, given the qualifications of these RSEs. This is the simplest
case in the contingency table type of categorical data analysis where there are only
two categories for each of the two variables, resulting in a 2 × 2 contingency table.
Response variables with only two categories are also called binary response variables.
2
2
The X and G statistics can be calculated using equations (12.5) and (12.6) as 3847
and 3876. The adjusted residuals for each cell can be calculated using (12.7). The X 2
2
and G statistics and their corresponding probability values describe the evidence
2
against the null hypothesis. In this particular example, the overall computed X and