Page 204 - Six Sigma Advanced Tools for Black Belts and Master Black Belts

P. 204

OTE/SPH
OTE/SPH
JWBK119-12
Char Count= 0
2:58
August 31, 2006
Logistic Regression Approach 189
Table 12.11 Logistic regression table for multiple logistic regression model in Equation (17).
95% CI of
odds ratio
SE p- Odds
Predictors Coefficient Coeff. Z value ratio Lower Upper
Constant −16.095 8.683 −1.85 0.064

Dummy variable
combinations
Color c 11 c 12 c 13

3 1 0 0 1.393 1.277 1.09 0.275 4.03 0.33 49.22
4 0 1 0 0.692 1.491 0.46 0.643 2.00 0.11 37.08
5 0 0 1 −0.090 1.899 −0.05 0.962 0.91 0.02 37.80

Dummy variable
combinations
Spine
Condition c 21 c 22
2 1 0 −0.838 1.386 −0.60 0.546 0.43 0.03 6.55
3 0 1 −2.108 1.243 −1.69 0.090 0.12 0.01 1.39

Width 0.521 0.451 1.15 0.249 1.68 0.69 4.08
Weight 0.002 0.002 0.83 0.406 1.00 1.00 1.01

explanatory variables can be written in terms of the other explanatory variables. The
presence of such multicollinearity will seriously affect both the parameter estimates
and their corresponding variance estimates. Hence, some significant explanatory vari-
ables may be incorrectly rejected through the usual statistical hypothesis tests.
A simple and effective way to check for the presence of multicollinearity is through
the use of scatter plots. The matrix plot function in MINITAB which places scatterplots
depicting relationships between any two variables in a matrix form can be used to
generate a visually effective summary of the scatterplots. Such a matrix plot is shown
in Figure 12.2. From the matrix plot, it can be observed that there appears to be a
significant relationship between the two continuous variables, weight and carapace
width. In order to remedy the multicollinearity, the weight variable is removed from
consideration. The regression is redone on the remaining variables. Table 12.12 shows
the logistic regression table for this reduced model.
With the risk of multicollinearity mitigated, the potential for further refining the
multiple regression model is investigated. The refinement process for complex re-
gression models to obtain more parsimonious models that are as effective is generally
known as model selection. The key goal in model selection is to achieve a parsimo-
nious model that can fit the actual data well. In order to achieve this, the model has
to balance the conflicting objective of achieving sufficient complexity to model the
actual data well, yet sufficient simplicity for practical interpretation. A wide vari-
ety of model selection algorithms for GLMs incorporating different statistical criteria

199 200 201 202 203 204 205 206 207 208 209