Page 196 - Six Sigma Advanced Tools for Black Belts and Master Black Belts
P. 196
OTE/SPH
OTE/SPH
August 31, 2006
Char Count= 0
2:58
JWBK119-12
Logistic Regression Approach 181
PhD RSE being employed in the private sector are 0.23 and the corresponding sample
odds of a non-PhD RSE being employed in the private sector are 2.67. The sample
odds ratio, defined as the odds of success for non-PhD RSEs over the odds of success
for PhD RSEs, is 11.2. The confidence interval for this odds ratio can be found by using
a large-sample normal approximation to the sampling distribution of ln ˆ θ. The mean
of this distribution is ln ˆ θ with the asymptotic standard error given by
1 1 1 1
ASE(ln ˆ θ) = + + + .
n 11 n 12 n 21 n 22
The confidence interval can thus be evaluated from
ln ˆ θ ± z α/2 ASE ln ˆ θ .
The confidence interval of the odds ratio for this example is evaluated to be (10.3,
12.2).
12.4 LOGISTIC REGRESSION APPROACH
The previous section treats statistical inference procedures for detecting the presence
of relationships between the response and explanatory variables. These techniques
essentially form the bedrock of statistical tools for categorical data analysis. In this
section a class of model-based statistical approaches based on the logistic regression
model for categorical data analysis is introduced.
There are many benefits associated with model-based approaches to characterizing
the relationships between response and explanatory variables. Appropriate models
allow statistically efficient estimation of the strength and importance of the effect
of each explanatory variable and the interactions between them. Model-based tech-
niques generally allow for more precise statistical estimates and stronger statistical
inferences. Furthermore, a model-based paradigm is able to handle more complex
cases involving multiple explanatory variables. In this section, a brief discussion of
the logistic regression model as a special case of the generalized linear model is pre-
sented. This is followed by a description of logistic regression for the case of a single
explanatory variable. An introduction to handling categorical responses with multiple
explanatory variables is then presented.
12.4.1 Logit link for logistic regression
Logistic regression models are essentially generalized linear models (GLMs) which
characterize relationships between a binary response variable and explanatory vari-
ables via a logit link function,
π (x)
ln
1 − π (x)
where π(x) is the probability of success for the binary variable.
A link function in generalized linear modeling terminology essentially describes
the functional relationship between the random and systematic component in a GLM.