Page 169 - Six Sigma Advanced Tools for Black Belts and Master Black Belts
P. 169
OTE/SPH
OTE/SPH
August 31, 2006
Char Count= 0
JWBK119-11
154 2:57 Goodness-of-Fit Tests for Normality
statistical models can range from theoretical parametric distributional models to more
empirical nonparametric or distribution-free models. Apart from the nature of the
processes which generate the data, the ‘appropriateness’ of a statistical model is in-
advertently also a function of the data collection and analysis processes. The decision
on whether a statistical model is appropriate for a particular data set is typically
underpinned by the three fundamental considerations of alignment with theoretical
process assumptions, robustness to departures from these assumptions, and down-
stream data-analysis procedures. On top of these considerations, the model has to be
judged on how well it represents the actual data. In order to achieve this, an entire
class of statistical techniques known as ‘goodness-of-fit’ (GOF) tests has been devel-
oped. Some of the more popular GOF techniques for assessing the adequacy of the
normal distribution in representing the data are reviewed in this chapter. Such GOF
tests of normality are commonly encountered in Six Sigma applications as many Six
Sigma statistical techniques rely on the normality assumption.
The fundamental statistical hypothesis testing concepts underlying GOF tests are
discussed in Section 11.2 as a precursor to setting the correct framework for appro-
priate applications of these tests. This is followed by a discussion of several popular
GOF tests. The basic concepts are presented together with the pros and cons associ-
ated with each of these tests. In order to aid understanding, the application procedure
is discussed through practical examples for all these tests.
11.2 UNDERLYING PRINCIPLES OF GOODNESS-OF-FIT TESTS
GOF tests were developed primarily from fundamental concepts in statistical hypoth-
3
esis testing attributed to Neyman and Pearson. In statistical hypothesis testing, there
is always a statement of a ‘null’ hypothesis and an ‘alternative’ hypothesis which are
sets of mutually exclusive possibilities in a sample space. For a typical statistical GOF
test these are defined as follows:
H 0 : F(x) = F 0 (x)vs. H 1 : F(x) = F 0 (x)
where F 0 (x) is some hypothesized distribution function. In Six Sigma applications
this is usually taken to be the normal distribution. To this end, it must be stressed
that our basic intent here is to limit our risk against severe departure from normality.
Generally, the primary aim is not in claiming that a particular hypothesized model,
represented by the distribution function, F 0 (x), is proven to be representative of the
real data, but to warn us of significant departure from F 0 (x). It should also be noted
that as “all models are wrong’’, any pre-conceived F 0 (x) is always open to rejection as
more information becomes available (i.e. as sample size increases).
GOF tests fall naturally into four broad categories: (1) methods based on discrete
2
classification of data (Pearson χ ); (2) empirical distribution function (EDF) based
P
methods; (3) regression based methods; and (4) methods based on sample moments.
Tests of each type are discussed in Sections 11.3--11.7. An example is used to demon-
strate the application of each test. Finally, the power of these tests for the data set in the
example is compared.

