Page 188 - Six Sigma Advanced Tools for Black Belts and Master Black Belts
P. 188

OTE/SPH
 OTE/SPH
                               Char Count= 0
                         2:58
 JWBK119-12
          August 31, 2006
                                Contingency Table Approach                   173
      one member of this family of techniques that is particularly prevalent and effective
      is logistic regression. This is discussed for cases of single and multiple categorical or
      continuous explanatory variables in Section 12.4.
                     12.2  CONTINGENCY TABLE APPROACH


      Categorical data can typically be presented in a tabular format when both the response
      and explanatory variables are categorical in nature, or can be defined in distinct cate-
      gories. Variables which are categorical in nature are commonly referred to as factors,
      and the different categories commonly referred to as factor levels. In many situations,
      the data in contingency tables are the frequency counts of observations occurring for
      each possible factor-level combination.
        Table 12.1 shows a typical two-way contingency table for a simple situation with
      only two categorical variables, Xand Y, with I and J levels, respectively. Each variable
      n ij (i = 1,..., I; j = 1, . . . , J) in the table shows the frequency of counts in each (i, j)
      factor-level combination. For each row (column) the marginal sums are shown in the
      ‘Total’ row (column). The total sample size is denoted by n.
        A number of statistical measures and procedures have been proposed to assess the
      association or relationship between variables in categorical data analysis. Statistical
      measures such as sample proportions, relative risks and odds ratios can be used in
      the case of binary variables in two-way contingency tables (see the case study in
      Section 12.3).
        Another key method is to use a rigorous statistical hypothesis test. Let π ij denote
      the probability of an observation belonging to category X = i and Y = j. The π ij
      thus define the joint probability distribution of X and Y. Denote by π i+ the marginal
      probability of an observation belonging to category X = i and by π + j the marginal
      probability of an observation belonging to category Y = j. A typical hypothesis test
      for a two-way contingency table with only one response and one explanatory variable
      is as follows:

        H 0 : π ij = π i+ π + j  vs.  H 1 : π ij  = π i+ π + j ,  for all i and j.
      The null hypothesis H 0 states that the variables X and Y are statistically indepen-
      dent. When this holds the probability of an observation falling in any particular col-
      umn is independent of which row that observation belongs to. This results in the


      Table 12.1 Two-way contingency table.

                                                Y
                            Level 1      Level 2      . . .    Level J      Total
      X        Level 1       n 11         n 12        ...        n 1J        n 1+
               Level 2       n 21         n 22        ...        n 2J        n 2+
               :              :            :                      :           :
               Level I       n I1         n I2        ...        n IJ        n I+
               Total         n +1         n +2                   n +J         n
   183   184   185   186   187   188   189   190   191   192   193