Page 392 - Using MIS
P. 392
360 Chapter 9 Business Intelligence Systems
One common unsupervised technique is cluster analysis. With it, statistical techniques
identify groups of entities that have similar characteristics. A common use for cluster analysis is
to find groups of similar customers from customer order and demographic data.
For example, suppose a cluster analysis finds two very different customer groups: One
group has an average age of 33, owns four Android phones and three iPads, has an expensive
home entertainment system, drives a Lexus SUV, and tends to buy expensive children’s play
equipment. The second group has an average age of 64, owns Arizona vacation property, plays
golf, and buys expensive wines. Suppose the analysis also finds that both groups buy designer
children’s clothing.
These findings are obtained solely by data analysis. There is no prior model about the pat-
terns and relationships that exist. It is up to the analyst to form hypotheses, after the fact, to ex-
plain why two such different groups are both buying designer children’s clothes.
Supervised Data Mining
With supervised data mining, data miners develop a model prior to the analysis and apply
statistical techniques to data to estimate parameters of the model. For example, suppose mar-
keting experts in a communications company believe that cell phone usage on weekends is
determined by the age of the customer and the number of months the customer has had the cell
phone account. A data mining analyst would then run an analysis that estimates the effect of
customer and account age.
One such analysis, which measures the effect of a set of variables on another variable, is
called a regression analysis. A sample result for the cell phone example is:
CellphoneWeekendMinutes = 12 + (17.5 × Customer Age)
+ (23.7 × NumberMonthsOfAccount)
Using this equation, analysts can predict the number of minutes of weekend cell phone use
by summing 12, plus 17.5 times the customer’s age, plus 23.7 times the number of months of the
account.
As you will learn in your statistics classes, considerable skill is required to interpret the
quality of such a model. The regression tool will create an equation, such as the one shown.
Whether that equation is a good predictor of future cell phone usage depends on statistical fac-
tors, such as t values, confidence intervals, and related statistical techniques.
Neural networks are another popular supervised data mining application used to predict
values and make classifications such as “good prospect” or “poor prospect” customers. The
term neural networks is deceiving because it connotes a biological process similar to that in ani-
mal brains. In fact, although the original idea of neural nets may have come from the anatomy
and physiology of neurons, a neural network is nothing more than a complicated set of possibly
nonlinear equations. Explaining the techniques used for neural networks is beyond the scope
of this text. If you want to learn more, search http://kdnuggets.com for the term neural network.
In the next sections, we will describe and illustrate two typical data mining tools—market-
basket analysis and decision trees—and show applications of those techniques. From this dis-
cussion, you can gain a sense of the nature of data mining. These examples should give you, a
future manager, a sense of the possibilities of data mining techniques. You will need additional
coursework in statistics, data management, marketing, and finance, however, before you will be
able to perform such analyses yourself.
Market-Basket Analysis
Suppose you run a dive shop, and one day you realize that one of your salespeople is much bet-
ter at up-selling to your customers. Any of your sales associates can fill a customer’s order, but
this one salesperson is especially good at selling customers items in addition to those for which
they ask. One day, you ask him how he does it.