Page 92 - Six Sigma Advanced Tools for Black Belts and Master Black Belts
P. 92

OTE/SPH
 OTE/SPH
                         2:55
          August 31, 2006
 JWBK119-06
                               Char Count= 0
                                   Process Variability                        77
        It is practically impossible to determine the true value of the population parameters
      μ and σ via a finite sample of size n. Thus, sample statistics are used. Suppose that x 1 ,
      x 2 ,..., x n are the observations in a sample. Then the variability of the process sample
      data is measured by the sample variance,
               n        2
         2     i=1  (x i − ¯x)
        s =              ,
         n
                  n
      where ¯x is the sample mean given by (   n  x i )/n. Note that the sample variance is
                                           i=1
      simply the sum of the squared deviations of each observation from the sample mean,
      divided by the sample size. However, the sample variance defined is not an unbiased
                                        2
      estimator of the population variance σ . In order to obtain an unbiased estimator for
       2
      σ , it is necessary instead to define a ‘bias-corrected sample variance’,
               n        2
         2     i=1  (x i − ¯x)
        s =              .
                n − 1
                                2
      An intuitive way to see why s gives a biased estimator of the population variance is
                                n
      that the true value of the population mean, μ, is almost never known, and so the sum
      of the squared deviations about the sample mean ¯x must be used instead. However,
      the observations x i tend to be closer to their sample mean than to the population mean.
      Therefore, to compensate for this, n − 1 is used as the divisor rather than n.If n is used
      as the divisor in the sample variance, we would obtain a measure of variability that
                                                                       2
      is, on the average, consistently smaller than the true population variance σ . Another
                                                          2
      way to think about this is to consider the sample variance s as being based on n − 1
      degrees of freedom since the sample mean is used instead.
        If the individual observations are from the normal distribution, the sample-to-
                           2
      sample randomness of s is explained through the following random variable:
             (n − 1) s 2
         2
        χ =          .                                                      (6.1)
                σ 2
                           2
                                                                           4
      The random variable χ follows what is known as the chi-square distribution with
      n − 1 degrees of freedom which is also its mean. Even though the derivation of this
      statistic is based on the normality of the x variable, the results will hold approximately
      as long as the departure from normality is not too severe.

      6.2.1 The unbiased estimator
                                                             2
      The sample mean ¯x and the biased-corrected sample variance s are unbiased estima-
                                                  2
      tors of the population mean and variance μ and σ , respectively. That is,
                               2
                         2
        E ¯x = μ and  E(s ) = σ .
      If there is no variability in the sample, then each sample observation x 1 = ¯x, and the
                                                                2
                      2
      sample variance s = 0. Generally, the larger the sample variance s , the greater is the
      variability in the sample data.
        While the sample variance provides an unbiased estimation of the population vari-
      ance, the positive square root of the sample variance, known as the sample standard
      deviation and denoted by s, is a biased estimator of the population standard deviation
   87   88   89   90   91   92   93   94   95   96   97