Page 57 - CITP Review
P. 57

The memory-based reasoning (MBR) method works well for matching and fraud detection. The
            application of MBR is to assign to a new observation a pre-classified example, that is, past transactions
            where results are known and thus can be accurately classified. Then a distance metric is used to classify
            new observations, that is, identify the highest number of matching fields to the pre-classified examples to
            predict the outcome of the new observation.

            The cluster methodology is based on classical statistical clustering algorithms, and is useful for
            predictions, such as timely loan repayment. In clustering, the average characteristics of pre-classified
            examples of the same outcome are used as measures for a new observation. The accumulated distance
            of attributes from the new observation to the body of each outcome’s attributes provides for a prediction
            of the outcome of the new observation. Usually, the values of all attributes are statistically normalized
            (0 to 1 values) for effectiveness.

            Decision tree algorithms can be developed to automatically generate a set of business process rules. The
            most differentiating attribute of the pre-classified examples is used to build a decision rule (for example,
            for banking loan decision, if credit history is good, outcome is pay on time 89% of time; if credit history is
            none, outcome is pay late 75% of time; if credit history is poor and income is < $30,000, outcome is
            default 80% of time). If the first branch of the decision tree is not satisfactorily high enough in prediction
            power, the next branch is examined.

            Market-based analysis is the least structured form of data mining and involves what is known as
            shopping basket analysis in the retail outlets and food industry. The intent of this methodology is to
            identify products that tend to be purchased together.


            The link analysis methodology is sometimes applied in the insurance industry to identify fraudulent
            claims. Tools such as Analyst’s Notebook, Net-Map, and Watson construct links to various objects to
            identify associations that might otherwise go unnoticed. The latest generation of link analysis tools
            provides not only graphical images of the links but also some interpretation of the links.


            Querying

            Querying refers to structured query language (SQL) and similar tools that have the ability to filter data into
            meaningful information. SQL can insert, query, update, and delete data; for data analytics and reporting,
            generally the query function is used.
            In querying, data is filtered using the select command to choose attributes (fields, columns). Other
            commands choose the file (from) or records (where), and provide a variety of other functions to facilitate
            the generation of customized information.

            With Microsoft SQL Server Analysis Services, a special querying language — data mining extensions
            (DMX) —is for data mining models. It works much like SQL and databases, using a data definition
            language (DDL) and data manipulation language (DML); however, in the case of DMX, the functionalities
            are built specifically for data mining purposes, so there are special commands in DMX to build prediction
            models and other data mining models.





            © 2019 Association of International Certified Professional Accountants. All rights reserved.    2-11
   52   53   54   55   56   57   58   59   60   61   62