Page 54 - CITP Review
P. 54

Data mapping and collection
            The data identification and mapping would necessarily need to be based on a thorough understanding of
            the enterprise data structure and data flows, preferably from established documentation.

            Data flows can occur within a system, among systems, and even offline. All of these should be examined
            in order to reflect the full data flow through the entity accurately.


            Data selection
            Depending on the needs of the end system (data warehouse, data mining, or BI system), the next step is
            to select the needed records and variables from the sources identified in step one (mapping and
            collection). Those records or variables not needed will be ignored and automatically filtered out of the
            selection process.


            Data cleaning
            In the second phase, data is “scrubbed” or cleaned. This cleaning is necessary because data tends to
            have missing values, outliers, and inconsistencies in operational databases.


            Impute missing values

            When values are missing, they are either intentionally ignored or an imputed value must be determined
            upon integration. Imputed values are based on the most probable value. Sometimes missing values
            reflect reality and thus should be null.

            The formula or process used to impute values needs to be predetermined, logical, documented, and
            repeatable. The system should also track the fact that a missing value was assigned an imputed value.
            This transparency for assigning imputed values is critical to data integrity and to relying on information
            being generated.



            Reduce noise in data
            Noisy data — outliers and errors — have to be corrected as well. Again, an analyst or BI expert would need
            to determine the most acceptable values to replace, or smooth out, outliers and errors.


            Eliminate inconsistencies
            Using domain knowledge, or the assistance of an expert, inconsistencies have to be fixed.
            Inconsistencies include unusual values in for a variable (column, field).


            Data transformation
            Data transformation then takes the clean data and transforms it into a more usable form for data mining
            or BI.








            © 2019 Association of International Certified Professional Accountants. All rights reserved.    2-8
   49   50   51   52   53   54   55   56   57   58   59