Page 197 - Six Sigma Advanced Tools for Black Belts and Master Black Belts
P. 197

OTE/SPH
 OTE/SPH
          August 31, 2006
 JWBK119-12
                          Introduction to the Analysis of Categorical Data
        182              2:58  Char Count= 0
        The random component in GLMs describes the distribution of the random response
        observations. The systematic component characterizes the relationship of the expected
        response with the explanatory variables through the link function. In GLMs, the sys-
        tematic component involves a function which is linear in the parameters. Using a
        simple linear model with one explanatory variable, x, the GLM for response variables
        in logistic regression has the form

               π (x)

          ln           = α + βx,                                            (12.13)
              1 − π (x)
        where α is a constant, and β is a slope parameter for the explanatory variable, x.
          Besides the logit link, the relationship between the random component and the sys-
        tematic component can be modeled with other link functions such as the Gompert,
                                            −1
        ln(−ln (1 − π)), and the probit/normit,   (π), where  (·) is the normal cumulative
        distribution function. In practice, the canonical link based on the response distribu-
        tion is most commonly used in generalized linear modeling. The canonical link is the
        link function which uses the natural parameter of the mean of the response distribu-
        tion. Binary response variables can be modeled with a Bernoulli distribution having
        the probability of success as its expected value or a binomial distribution when the
        response is a sum of such Bernoulli distributed binary responses. The canonical link
        function for the random variables which follows either the Bernoulli or binomial
        distribution is the logit link.
          The use of the logit link in logistic regression models offers many other distinct
        advantages for modeling binary response variables. The logit link function essentially
        depicts the odds of ‘success’. The odds of success can be evaluated directly by taking
        the antilog of the GLM with the logit link function:
            π (x)     α βx
                   = e e                                                    (12.14)
          1 − π (x)
        From (12.14) it can be observed that the odds change multiplicatively by a factor of e β
        with each unit increase in x. The modeling of log odds also implies that the logistic
        regression model can be readily used for the analysis of data from retrospective sam-
                    1
        pling designs through the use of odds ratio. As can be observed from the definition
        of the odds ratio in (12.11) and (12.12), the ratio does not change when the response
        and explanatory variables are interchanged. In retrospective sampling designs such
        as case--control studies, the number of cases for each category of the response vari-
        ables is fixed by the sampling design. Hence, in order to evaluate the conditional
        distributions of the response, the symmetry property of the odds ratio can be used.


        12.4.2 Logistic regression with a single continuous explanatory variable
                                                                      1
        Agresti gives an example on the study of the nesting of horseshoe crabs. Each female
        horseshoe crab in the study has a male crab attached to her which is considered
        her ‘husband’. An investigation was conducted to determine the factors which affect
        whether a female horseshoe crab has any other male crabs residing nearby apart
        from her husband. These male horseshoe crabs are called ‘satellites’. The presence
        of satellites is thought to depend on various factors. Some of these possible factors
        are the female crabs’ color, spine condition, weight, and carapace width. As the main
   192   193   194   195   196   197   198   199   200   201   202