Page 187 - Six Sigma Advanced Tools for Black Belts and Master Black Belts
P. 187

OTE/SPH
 OTE/SPH
          August 31, 2006
 JWBK119-12
                          Introduction to the Analysis of Categorical Data
        172              2:58  Char Count= 0
        continuous scales. This chapter attempts to provide a practical introduction to such
        techniques.
          In statistical terminology, KPOVs and KPIVs are commonly known as response and
        explanatory variables, respectively. The use of ‘KPOV’ and ‘KPIV’ in typical Six Sigma
        terminology tends to imply a causal relationship between the input and output vari-
        ables, whereas the terms ‘response’ and ‘explanatory’ have more generic connota-
        tions. In many situations, the main purpose of studies conducted for transactional or
        manufacturing processes with qualitative responses may be to analyze the generic
        associations between qualitative response and explanatory variables. The ability to
        model causal relationships typically depends on the sampling design, a matter which
        is beyond the scope of this chapter. In any case, models that can describe the causal
        relationships between KPIVs and KPOVs are specialized cases of the statistical mod-
        els described in this chapter. In view of this, the present chapter refers to response
        and explanatory variables instead of KPOVs and KPIVs, respectively.
          Categorical response variables are frequently encountered in many real-world pro-
        cessesandareparticularlywellsuitedtodatafromtransactionalprocesses.Qualitative
        responses such as the degree of customer satisfaction with a particular transactional
        process are usually measured on a categorical scale such as ‘dissatisfied’, ‘neutral’,
        and ‘satisfied’. Such data may arise from customer satisfaction surveys, and the spec-
        trum of response can be widened or narrowed depending on the survey design. In
        certain situations categorical data may also arise more artificially due to the high cost
        of obtaining continuous data. It should be noted that resorting to regression tech-
        niques based on OLS assumptions for analyzing categorical response data may result
        in highly flawed conclusions. Typically, categorical response data contains less infor-
        mation and is therefore, more difficult to analyze, and the statistical quality of results
        is usually not as good as those obtained from continuous data.
          In certain situations, the categorical response or explanatory variables possess nat-
        ural ordering information. Such ordering information can be leveraged on to provide
        more precise and sensitive statistical inferences. Data with such ordinal information is
        a special case of categorical data and is typically known as ordinal data. The customer
        satisfaction scale described in the previous paragraph is one example. Other examples
        include the response to a particular medical treatment in a clinical trial and the re-
        sponse to certain doping processes in metal treatment procedures, where the response
        can be classified into ordered categorical scales such as ‘not effective’, ‘effective’, and
        ‘very effective’. In contrast to ordinal categorical variables, categorical variables with
        no inherent ordinal information are also pervasive. Such variables are known as nom-
        inal variables; examples include race, which can be classified as to ‘Chinese’, ‘Malay’,
        ‘Indian’, ‘Eurasian’, etc., and color, consisting of ‘red’, ‘green’, ‘blue’, etc. Analyses of
        nominal data are more generic than those of ordinal categorical data. Both types of
        categorical data analysis procedure are introduced in this chapter.
          There are generally two broad streams of development in the analysis of categorical
        data: procedures based on the use of contingency tables and those based on gener-
        alized linear modeling. Techniques derived from the use of contingency tables were
        the original statistical tools used in categorical data analysis; these are discussed in
        the next section. This is followed by a case study demonstrating the use of these
        fundamental techniques. Later progress in linear modeling techniques led to the de-
        velopment of generalized linear modeling techniques for categorical data analysis;
   182   183   184   185   186   187   188   189   190   191   192