Page 187 - Six Sigma Advanced Tools for Black Belts and Master Black Belts
P. 187
OTE/SPH
OTE/SPH
August 31, 2006
JWBK119-12
Introduction to the Analysis of Categorical Data
172 2:58 Char Count= 0
continuous scales. This chapter attempts to provide a practical introduction to such
techniques.
In statistical terminology, KPOVs and KPIVs are commonly known as response and
explanatory variables, respectively. The use of ‘KPOV’ and ‘KPIV’ in typical Six Sigma
terminology tends to imply a causal relationship between the input and output vari-
ables, whereas the terms ‘response’ and ‘explanatory’ have more generic connota-
tions. In many situations, the main purpose of studies conducted for transactional or
manufacturing processes with qualitative responses may be to analyze the generic
associations between qualitative response and explanatory variables. The ability to
model causal relationships typically depends on the sampling design, a matter which
is beyond the scope of this chapter. In any case, models that can describe the causal
relationships between KPIVs and KPOVs are specialized cases of the statistical mod-
els described in this chapter. In view of this, the present chapter refers to response
and explanatory variables instead of KPOVs and KPIVs, respectively.
Categorical response variables are frequently encountered in many real-world pro-
cessesandareparticularlywellsuitedtodatafromtransactionalprocesses.Qualitative
responses such as the degree of customer satisfaction with a particular transactional
process are usually measured on a categorical scale such as ‘dissatisfied’, ‘neutral’,
and ‘satisfied’. Such data may arise from customer satisfaction surveys, and the spec-
trum of response can be widened or narrowed depending on the survey design. In
certain situations categorical data may also arise more artificially due to the high cost
of obtaining continuous data. It should be noted that resorting to regression tech-
niques based on OLS assumptions for analyzing categorical response data may result
in highly flawed conclusions. Typically, categorical response data contains less infor-
mation and is therefore, more difficult to analyze, and the statistical quality of results
is usually not as good as those obtained from continuous data.
In certain situations, the categorical response or explanatory variables possess nat-
ural ordering information. Such ordering information can be leveraged on to provide
more precise and sensitive statistical inferences. Data with such ordinal information is
a special case of categorical data and is typically known as ordinal data. The customer
satisfaction scale described in the previous paragraph is one example. Other examples
include the response to a particular medical treatment in a clinical trial and the re-
sponse to certain doping processes in metal treatment procedures, where the response
can be classified into ordered categorical scales such as ‘not effective’, ‘effective’, and
‘very effective’. In contrast to ordinal categorical variables, categorical variables with
no inherent ordinal information are also pervasive. Such variables are known as nom-
inal variables; examples include race, which can be classified as to ‘Chinese’, ‘Malay’,
‘Indian’, ‘Eurasian’, etc., and color, consisting of ‘red’, ‘green’, ‘blue’, etc. Analyses of
nominal data are more generic than those of ordinal categorical data. Both types of
categorical data analysis procedure are introduced in this chapter.
There are generally two broad streams of development in the analysis of categorical
data: procedures based on the use of contingency tables and those based on gener-
alized linear modeling. Techniques derived from the use of contingency tables were
the original statistical tools used in categorical data analysis; these are discussed in
the next section. This is followed by a case study demonstrating the use of these
fundamental techniques. Later progress in linear modeling techniques led to the de-
velopment of generalized linear modeling techniques for categorical data analysis;

