Page 196 - Six Sigma Advanced Tools for Black Belts and Master Black Belts
P. 196

OTE/SPH
 OTE/SPH
          August 31, 2006
                               Char Count= 0
                         2:58
 JWBK119-12
                               Logistic Regression Approach                  181
      PhD RSE being employed in the private sector are 0.23 and the corresponding sample
      odds of a non-PhD RSE being employed in the private sector are 2.67. The sample
      odds ratio, defined as the odds of success for non-PhD RSEs over the odds of success
      for PhD RSEs, is 11.2. The confidence interval for this odds ratio can be found by using
      a large-sample normal approximation to the sampling distribution of ln ˆ θ. The mean
      of this distribution is ln ˆ θ with the asymptotic standard error given by

                      1     1    1     1
        ASE(ln ˆ θ) =    +    +     +    .
                     n 11  n 12  n 21  n 22
      The confidence interval can thus be evaluated from

        ln ˆ θ ± z α/2 ASE ln ˆ θ  .
      The confidence interval of the odds ratio for this example is evaluated to be (10.3,
      12.2).


                    12.4 LOGISTIC REGRESSION APPROACH

      The previous section treats statistical inference procedures for detecting the presence
      of relationships between the response and explanatory variables. These techniques
      essentially form the bedrock of statistical tools for categorical data analysis. In this
      section a class of model-based statistical approaches based on the logistic regression
      model for categorical data analysis is introduced.
        There are many benefits associated with model-based approaches to characterizing
      the relationships between response and explanatory variables. Appropriate models
      allow statistically efficient estimation of the strength and importance of the effect
      of each explanatory variable and the interactions between them. Model-based tech-
      niques generally allow for more precise statistical estimates and stronger statistical
      inferences. Furthermore, a model-based paradigm is able to handle more complex
      cases involving multiple explanatory variables. In this section, a brief discussion of
      the logistic regression model as a special case of the generalized linear model is pre-
      sented. This is followed by a description of logistic regression for the case of a single
      explanatory variable. An introduction to handling categorical responses with multiple
      explanatory variables is then presented.

      12.4.1 Logit link for logistic regression

      Logistic regression models are essentially generalized linear models (GLMs) which
      characterize relationships between a binary response variable and explanatory vari-
      ables via a logit link function,

             π (x)
        ln
            1 − π (x)
      where π(x) is the probability of success for the binary variable.
        A link function in generalized linear modeling terminology essentially describes
      the functional relationship between the random and systematic component in a GLM.
   191   192   193   194   195   196   197   198   199   200   201