Page 203 - Six Sigma Advanced Tools for Black Belts and Master Black Belts
P. 203

OTE/SPH
 OTE/SPH
          August 31, 2006
 JWBK119-12
                          Introduction to the Analysis of Categorical Data
        188              2:58  Char Count= 0
          The preceding observations are based on assumption that all other possible ex-
        planatory variables are kept constant. As observed in Table 12.7, there are most prob-
        ably other factors which simultaneously affect the presence of satellite crabs. More
        complex logistic regression models that can take into account multiple explanatory
        variables may be necessary to fully describe a model with higher predictive power.
        The plausibility of such models is investigated in the next subsection.



        12.4.4 Multiple logistic regression
        The ability of simple logistic regression models with a single explanatory variable to
        generalize to complex multiple logistic regression models with multiple variables is
        similar to that of the generalization of simple linear regression models to multiple
        linear regression models in OLS regression. A typical multiple logistic regression
        model for binary responses with k categorical variables and l continuous variables
        can be represented as follows:


               π
          ln         = α + β C1 c 1 + β C2 c 2 +· · · + β k c k + β 1 x 1 + β 2 x 2 +· · · + β l x l ,
              1 − π
        where α is a constant, c i is the ith categorical explanatory variable and β Ci its slope
        parameter, x i is the ith continuous explanatory variable and β i its slope parameter,
        and π is the probability of success.
          In the horseshoe crab data shown in Table 12.7, the possible categorical variables
        are the crab color and spine condition. The continuous explanatory variables are
        the weight of crab and carapace width. The fitted model could potentially be of the
        following form:


               π Pres
          ln           = α + β C1 c 1 + β C2 c 2 + β 1 x 1 + β 2 x 2        (12.17)
              1 − π Pres
        where, c 1 is the color variable, c 2 the spine condition, x 1 the width, x 2 the weight,
        and π Pres the probability of finding a satellite crab nearby. The categorical variable c 1
        has four levels whereas c 2 has three; hence, three dummy variables are necessary to
        completely describe c 1 and two are needed for c 2 . The three dummy variables for c 1
        are represented by c 1i for i = {1, 2, 3} and for c 2 are represented by c 2 j for j = {1, 2}.
        MLEs of the parameters evaluated using MINITAB is shown in Table 12.11.
          From Table 12.11, there appear to be no explanatory variables which are significant.
        This contradicts the earlier analysis with single categorical and continuous variables.
                         2
        Furthermore, the G statistic calculated with Equation (12.16) using MINITAB is 26.3.
        The null hypothesis for the test based on this statistic states that the response is jointly
                                                                           2
        independent of all the explanatory variables. Based on an asymptotic null χ distri-
        bution with 7 degrees of freedom, there is very strong evidence of the presence of
        significant effects in at least one of the explanatory variables on the response. The fact
        that all effects in the logistic regression table in Table 12.11 showed up as insignificant
        instead could be indicative of the presence of significant multicollinearity between
        the explanatory variables. Multicollinearity essentially refers to the presence of sig-
        nificant relationships between the explanatory variables such that some or all of the
   198   199   200   201   202   203   204   205   206   207   208