Page 201 - Six Sigma Advanced Tools for Black Belts and Master Black Belts
P. 201
OTE/SPH
OTE/SPH
August 31, 2006
JWBK119-12
Introduction to the Analysis of Categorical Data
186 2:58 Char Count= 0
likelihood ratio statistic is given by equation (12.16). The ratio of two likelihoods
can be written as a difference between 2 log likelihoods. A special case of the likeli-
hood ratio written as a difference of two log likelihoods which is used in likelihood
ratio GOF tests is the ‘deviance’ statistic given by
Deviance =−2 (L N − L S )
where L S is the maximum log likelihood for a saturated model (with a separate param-
eter for each observation, resulting in a perfect fit of the observed data) and L N is the
maximum log likelihood for a non-saturated model. For logistic regression involving
2
binomial random variables, the deviance statistic has an asymptotic null χ distribu-
tion with N − p degrees of freedom, where N is the number of distinct observations
2
and p is the number of parameters in the model. By utilizing the χ distribution,
a statistical hypothesis test can be conducted to validate the explanatory power of a
postulated non-saturated model compared to the perfect explanatory power of a satu-
rated model for a given data set. The null hypothesis for this test is that all parameters
that are in the saturated model but not in the postulated non-saturated model are
zero. For the horseshoe crab example with a single continuous explanatory variable
(carapace width) the deviance of this model is calculated as 33.28 with 33 degrees of
freedom. This gives a p-value of 0.454. Hence, the null hypothesis cannot be rejected
at the 5% significance level.
It should be noted that there are other GOF tests for categorical data analysis with
2
continuous explanatory variables such as the Hosmer--Lemeshow tests and those
based on Pearson residuals. Such tests can be easily evaluated with statistical software
1
such as MINITAB. Details of these tests are given by Agresti. The p-value is evaluated
together with the GOF statistics in MINITAB and can be used to provide a more
comprehensive judgment on the GOF of the estimated model.
12.4.3 Logistic regression with single categorical explanatory variable
Logistic regression can easily be extended to model categorical explanatory variables
via the use of dummy variables. The procedure for handling categorical explanatory
variables is described in this section using the above horseshoe crab example. The
explanatory variable of interest is taken to be the spine condition. From Table 12.7, it
can be observed that spine condition is essentially a categorical variable with three
levels. Two binary dummy variables which take values of 0 or 1 are necessary to fully
describe the three levels of spine condition. The two dummy variables are denoted
as c S1 and c S2 . The logistic regression model with a single categorical explanatory
variable is given by
π Pres (c S1 , c S2 )
In = α S + β S1 c S1 + β S2 c S2 .
1 − π Pres (c S1 , c S2 )
MINITAB offers the facility to analyze logistic regression models with categorical
explanatory variables. The output from MINITAB using the horseshoe crab dataset is
reproduced in Table 12.9. From the output, the MLE coefficients of both the constant
and the parameters of the two dummy variables, ˆα S , ˆ β S1 and ˆ β S2 , are found to be
significant at the 5% level by looking at the z-statistic and p-value. Similar to the
logistic regression model with a single continuous explanatory variable, the p-value is
basedonanasymptoticnormaldistribution.Thelogitfunctionforthespineconditions