Page 197 - Six Sigma Advanced Tools for Black Belts and Master Black Belts
P. 197
OTE/SPH
OTE/SPH
August 31, 2006
JWBK119-12
Introduction to the Analysis of Categorical Data
182 2:58 Char Count= 0
The random component in GLMs describes the distribution of the random response
observations. The systematic component characterizes the relationship of the expected
response with the explanatory variables through the link function. In GLMs, the sys-
tematic component involves a function which is linear in the parameters. Using a
simple linear model with one explanatory variable, x, the GLM for response variables
in logistic regression has the form
π (x)
ln = α + βx, (12.13)
1 − π (x)
where α is a constant, and β is a slope parameter for the explanatory variable, x.
Besides the logit link, the relationship between the random component and the sys-
tematic component can be modeled with other link functions such as the Gompert,
−1
ln(−ln (1 − π)), and the probit/normit, (π), where (·) is the normal cumulative
distribution function. In practice, the canonical link based on the response distribu-
tion is most commonly used in generalized linear modeling. The canonical link is the
link function which uses the natural parameter of the mean of the response distribu-
tion. Binary response variables can be modeled with a Bernoulli distribution having
the probability of success as its expected value or a binomial distribution when the
response is a sum of such Bernoulli distributed binary responses. The canonical link
function for the random variables which follows either the Bernoulli or binomial
distribution is the logit link.
The use of the logit link in logistic regression models offers many other distinct
advantages for modeling binary response variables. The logit link function essentially
depicts the odds of ‘success’. The odds of success can be evaluated directly by taking
the antilog of the GLM with the logit link function:
π (x) α βx
= e e (12.14)
1 − π (x)
From (12.14) it can be observed that the odds change multiplicatively by a factor of e β
with each unit increase in x. The modeling of log odds also implies that the logistic
regression model can be readily used for the analysis of data from retrospective sam-
1
pling designs through the use of odds ratio. As can be observed from the definition
of the odds ratio in (12.11) and (12.12), the ratio does not change when the response
and explanatory variables are interchanged. In retrospective sampling designs such
as case--control studies, the number of cases for each category of the response vari-
ables is fixed by the sampling design. Hence, in order to evaluate the conditional
distributions of the response, the symmetry property of the odds ratio can be used.
12.4.2 Logistic regression with a single continuous explanatory variable
1
Agresti gives an example on the study of the nesting of horseshoe crabs. Each female
horseshoe crab in the study has a male crab attached to her which is considered
her ‘husband’. An investigation was conducted to determine the factors which affect
whether a female horseshoe crab has any other male crabs residing nearby apart
from her husband. These male horseshoe crabs are called ‘satellites’. The presence
of satellites is thought to depend on various factors. Some of these possible factors
are the female crabs’ color, spine condition, weight, and carapace width. As the main