Page 171 - Six Sigma Advanced Tools for Black Belts and Master Black Belts
P. 171
OTE/SPH
OTE/SPH
August 31, 2006
Char Count= 0
JWBK119-11
156 2:57 Goodness-of-Fit Tests for Normality
Table 11.1 Residuals from prediction model of flight times.
0.06 −0.28 0.54 0.29 −0.33 0.05 −0.19 0.55
0.39 −0.47 −0.39 0.69 −0.19 −0.62 −0.07 −0.58
0.29 0.30 0.15 0.48 −0.08 −0.33 0.24 −0.52
0.32 −0.49 0.07 0.50 −0.04 0.33 0.18 −0.20
0.14 −0.24 0.19 −0.19 0.03 −0.14 0.07 −0.57
ˆ
the error in the fit of the prediction model (Y i ) to the ith observation (Y i ) and is given
by
ˆ
e i = Y i − Y i .
In performing the regression analysis the usual assumptions that the errors are
independent and follow a normal distribution with zero mean and constant variance
are made. Hence, if the prediction model adequately describes the behavior of the
actual system, its residuals should not exhibit tendencies towards non-normality.
A GOF test for normality is usually conducted on these residuals to determine the
adequacy of the prediction model.
The population mean, μ, for this test is 0 and the standard deviation , σ, is assumed
to be unknown and has to be estimated from the data. For theoretical consistency,
these parameters should be estimated using the maximum likelihood method. Under
¯
the null hypothesis, the sample mean, X , is a maximum likelihood estimator for the
population mean. The population standard deviation for a sample of size n can be
estimated from
2 1/2
n ¯
(X i − X)
ˆ σ =
n
i=1
where X i represents observation i. This is the maximum likelihood estimator for the
population standard deviation given the null hypothesis that the data comes from a
normal distribution.
The data is then ordered and grouped into k classes. The determination of class
groupings for sample data sets has long been a subject of contention. A relatively ro-
bust rule-of-thumb method frequently suggested is for the expected frequency within
each class to be at least 5. However, it should be noted that there has been no general
agreement regarding this minimum expected frequency. If the expected frequency is
too small, adjacent cells can be combined.
For a discrete distribution such as Poisson or binomial, data can be naturally as-
signed into discrete classes with class boundaries clearly defined based on this rule
of thumb. The classes may be set up to ensure that the number of classes should not
exceed n/5. This is to ensure that the number of observations in each class is not less
than 5. The class boundaries can be determined in an equiprobability manner where
the probability of a random observation falling into each class is equal and estimated
to be 1/k for k classes as follows:
1 2 k − 1
P (x ≤ x 1 ) = , P (x ≤ x 2 ) = ,..., P (x ≤ x k−1 ) = .
k k k