Page 204 - Six Sigma Advanced Tools for Black Belts and Master Black Belts
P. 204

OTE/SPH
 OTE/SPH
 JWBK119-12
                               Char Count= 0
                         2:58
          August 31, 2006
                               Logistic Regression Approach                  189
      Table 12.11 Logistic regression table for multiple logistic regression model in Equation (17).
                                                                      95% CI of
                                                                      odds ratio
                                              SE           p-  Odds
      Predictors                   Coefficient Coeff.  Z  value ratio Lower Upper
      Constant                     −16.095   8.683 −1.85 0.064

                  Dummy variable
                   combinations
      Color    c 11   c 12   c 13

      3         1      0     0       1.393   1.277   1.09 0.275  4.03  0.33  49.22
      4         0      1     0       0.692   1.491   0.46 0.643  2.00  0.11  37.08
      5         0      0     1     −0.090    1.899 −0.05 0.962  0.91  0.02  37.80

                   Dummy variable
                    combinations
      Spine
      Condition  c 21      c 22
      2          1          0         −0.838  1.386  −0.60  0.546  0.43  0.03  6.55
      3          0          1         −2.108  1.243  −1.69  0.090  0.12  0.01  1.39

      Width                            0.521  0.451  1.15  0.249  1.68  0.69  4.08
      Weight                           0.002  0.002  0.83  0.406  1.00  1.00  1.01




      explanatory variables can be written in terms of the other explanatory variables. The
      presence of such multicollinearity will seriously affect both the parameter estimates
      and their corresponding variance estimates. Hence, some significant explanatory vari-
      ables may be incorrectly rejected through the usual statistical hypothesis tests.
        A simple and effective way to check for the presence of multicollinearity is through
      the use of scatter plots. The matrix plot function in MINITAB which places scatterplots
      depicting relationships between any two variables in a matrix form can be used to
      generate a visually effective summary of the scatterplots. Such a matrix plot is shown
      in Figure 12.2. From the matrix plot, it can be observed that there appears to be a
      significant relationship between the two continuous variables, weight and carapace
      width. In order to remedy the multicollinearity, the weight variable is removed from
      consideration. The regression is redone on the remaining variables. Table 12.12 shows
      the logistic regression table for this reduced model.
        With the risk of multicollinearity mitigated, the potential for further refining the
      multiple regression model is investigated. The refinement process for complex re-
      gression models to obtain more parsimonious models that are as effective is generally
      known as model selection. The key goal in model selection is to achieve a parsimo-
      nious model that can fit the actual data well. In order to achieve this, the model has
      to balance the conflicting objective of achieving sufficient complexity to model the
      actual data well, yet sufficient simplicity for practical interpretation. A wide vari-
      ety of model selection algorithms for GLMs incorporating different statistical criteria
   199   200   201   202   203   204   205   206   207   208   209