Page 57 - CITP Review

P. 57

The memory-based reasoning (MBR) method works well for matching and fraud detection. The
application of MBR is to assign to a new observation a pre-classified example, that is, past transactions
where results are known and thus can be accurately classified. Then a distance metric is used to classify
new observations, that is, identify the highest number of matching fields to the pre-classified examples to
predict the outcome of the new observation.

The cluster methodology is based on classical statistical clustering algorithms, and is useful for
predictions, such as timely loan repayment. In clustering, the average characteristics of pre-classified
examples of the same outcome are used as measures for a new observation. The accumulated distance
of attributes from the new observation to the body of each outcome’s attributes provides for a prediction
of the outcome of the new observation. Usually, the values of all attributes are statistically normalized
(0 to 1 values) for effectiveness.

Decision tree algorithms can be developed to automatically generate a set of business process rules. The
most differentiating attribute of the pre-classified examples is used to build a decision rule (for example,
for banking loan decision, if credit history is good, outcome is pay on time 89% of time; if credit history is
none, outcome is pay late 75% of time; if credit history is poor and income is < $30,000, outcome is
default 80% of time). If the first branch of the decision tree is not satisfactorily high enough in prediction
power, the next branch is examined.

Market-based analysis is the least structured form of data mining and involves what is known as
shopping basket analysis in the retail outlets and food industry. The intent of this methodology is to
identify products that tend to be purchased together.

The link analysis methodology is sometimes applied in the insurance industry to identify fraudulent
claims. Tools such as Analyst’s Notebook, Net-Map, and Watson construct links to various objects to
identify associations that might otherwise go unnoticed. The latest generation of link analysis tools
provides not only graphical images of the links but also some interpretation of the links.

Querying

Querying refers to structured query language (SQL) and similar tools that have the ability to filter data into
meaningful information. SQL can insert, query, update, and delete data; for data analytics and reporting,
generally the query function is used.
In querying, data is filtered using the select command to choose attributes (fields, columns). Other
commands choose the file (from) or records (where), and provide a variety of other functions to facilitate
the generation of customized information.

With Microsoft SQL Server Analysis Services, a special querying language — data mining extensions
(DMX) —is for data mining models. It works much like SQL and databases, using a data definition
language (DDL) and data manipulation language (DML); however, in the case of DMX, the functionalities
are built specifically for data mining purposes, so there are special commands in DMX to build prediction
models and other data mining models.

52 53 54 55 56 57 58 59 60 61 62