Page 394 - Using MIS
P. 394
362 Chapter 9 Business Intelligence Systems
These data are interesting by themselves, but we can refine the analysis by taking another
step and considering additional probabilities. For example, what proportion of the customers
who bought a mask also bought fins? Masks were purchased 270 times, and of those individu-
als who bought masks, 250 also bought fins. Thus, given that a customer bought a mask, we can
estimate the probability that he or she will buy fins to be 250/270, or .926. In market-basket ter-
minology, such a conditional probability estimate is called the confidence.
Reflect on the meaning of this confidence value. The likelihood of someone walking in the
door and buying fins is 250/400, or .625. But the likelihood of someone buying fins, given that
he or she bought a mask, is .926. Thus, if someone buys a mask, the likelihood that he or she
will also buy fins increases substantially, from .625 to .926. Thus, all sales personnel should be
trained to try to sell fins to anyone buying a mask.
Now consider dive computers and fins. Of the 400 transactions, fins were sold 280 times, so
the probability that someone walks into the store and buys fins is .7. But of the 120 purchases of
dive computers, only 20 appeared with fins. So the likelihood of someone buying fins, given he
or she bought a dive computer, is 20/120, or .1666. Thus, when someone buys a dive computer,
the likelihood that he or she will also buy fins falls from .625 to .1666.
The ratio of confidence to the base probability of buying an item is called lift. Lift shows
how much the base probability increases or decreases when other products are purchased. The
lift of fins and a mask is the confidence of fins given a mask, divided by the base probability of
fins. In Figure 9-21, the lift of fins and a mask is .926/.7, or 1.32. Thus, the likelihood that people
buy fins when they buy a mask increases by 32 percent. Surprisingly, it turns out that the lift of
fins and a mask is the same as the lift of a mask and fins. Both are 1.32.
We need to be careful here, though, because this analysis shows only shopping carts with two
items. We cannot say from this data what the likelihood is that customers, given that they bought a
mask, will buy both weights and fins. To assess that probability, we need to analyze shopping carts
with three items. This statement illustrates, once again, that we need to know what problem we’re
solving before we start to build the information system to mine the data. The problem definition
will help us decide if we need to analyze three-item, four-item, or some other sized shopping cart.
Many organizations are benefiting from market-basket analysis today. You can expect that
this technique will become a standard CRM analysis during your career.
Decision Trees
A decision tree is a hierarchical arrangement of criteria that predict a classification or a value.
Here we will consider decision trees that predict classifications. Decision tree analyses are an
unsupervised data mining technique: The analyst sets up the computer program and provides
the data to analyze, and the decision tree program produces the tree.
A common business application of decision trees is to classify loans by likelihood of default.
Organizations analyze data from past loans to produce a decision tree that can be converted to
loan-decision rules. A financial institution could use such a tree to assess the default risk on a
new loan. Sometimes, too, financial institutions sell a group of loans (called a loan portfolio) to
one another. An institution considering the purchase of a loan portfolio can use the results of a
decision tree program to evaluate the risk of a given portfolio.
Figure 9-22 shows an example provided by Insightful Corporation, a vendor of BI tools.
This example was generated using its Insightful Miner product. This tool examined data from
3,485 loans. Of those loans, 72 percent had no default and 28 percent did default. To perform the
analysis, the decision tree tool examined six different loan characteristics.
In this example, the decision tree program determined that the percentage of the loan that
is past due (PercPastDue) is the best first criterion. Reading Figure 9-22, you can see that of the
2,574 loans with a PercPastDue value of 0.5 or less (amount past due is less than half the loan
amount), 94 percent were not in default. Reading down several lines in this tree, 911 loans had a
value of PercPastDue greater than 0.5; of those loans, 89 percent were in default.