Page 173 - Six Sigma Advanced Tools for Black Belts and Master Black Belts
P. 173
OTE/SPH
OTE/SPH
August 31, 2006
Char Count= 0
JWBK119-11
158 2:57 Goodness-of-Fit Tests for Normality
as follows:
⎪ 0, x < x (1) ,
⎧
r
⎪
⎨
F n (x) = , x (r) ≤ x ≤ x (r+1) ,
⎪ n
⎪
1, x (n) ≤ x.
⎩
The development of EDF based approaches initially focused on continuous data.
Subsequently, several modifications were developed for discrete and grouped data.
EDF based statistics can be broadly classified into two types, based on how the
9
deviation between the EDF (F n ) and the hypothesized CDF (F 0 ) is measured. The
‘supremum’class of statistics essentially computes the maximum deviations between
F n and F 0 . This class includes the well-known Kolmogorov--Smirnov D statistic. The
‘quadratic’ class computes the following measure of deviation between F n and F 0 :
∞
2
2
Q = n [F n (x) − F 0 (x)] ψ(x)dF 0 (x), (11.3)
−∞
where n is the sample size and ψ(x) is a weighting function for the deviations
2
[F n (x) − F 0 (x)] . This class includes the Cram´er--von Mises family of statistics which
encompass GOF tests that utilize statistics such as the Cram´er--von Mises statistic, the
Anderson--Darling statistic and the Watson statistic. Here, the more popular Cram´er-
-von Mises and Anderson--Darling statistics are discussed. Details of the Watson
2
statistic (commonly denoted by U ), useful for some special cases such as points on a
10
circle, can be found elsewhere. The essential difference between statistics in this class
is that the weighting function, ψ(x), is defined differently so as to weight deviations
according to the importance attached to various portions of the distribution function.
In this section, the Kolmogorov--Smirnov D statistic is described first followed by
the Cram´er--von Mises and Anderson--Darling statistics.
11.4.1 Kolmogorov--Smirnov
The Kolmogorov--Smirnov test based on the Dstatistic is perhaps the best-known EDF
based GOF test. The measure of deviation in such a test is essentially the maximum
absolute difference between F n (x) and F 0 (x): 6,11,12
D = sup [|F n (x) − F 0 (x)|] .
x
Although this expression for D looks problematic, the asymptotic distribution of this
statistic was established by Kolmogorov, 13 and later also by Feller 12 and Doob. 11
Due to the structure of this statistic, a confidence band can easily be set up such that
thetruedistribution, F(x),liesentirelywithinthebandwithaprobabilityof1 − α.This
can be done as follows. Given any true F(x), if d α is the critical value of D for test size α,
P {D > d α } = α.
This probability statement can be inverted to give the confidence statement:
P F n (x) − d α ≤ F(x) ≤ F n (x) + d α , ∀x = 1 − α. (11.4)