Page 54 - CITP Review
P. 54
Data mapping and collection
The data identification and mapping would necessarily need to be based on a thorough understanding of
the enterprise data structure and data flows, preferably from established documentation.
Data flows can occur within a system, among systems, and even offline. All of these should be examined
in order to reflect the full data flow through the entity accurately.
Data selection
Depending on the needs of the end system (data warehouse, data mining, or BI system), the next step is
to select the needed records and variables from the sources identified in step one (mapping and
collection). Those records or variables not needed will be ignored and automatically filtered out of the
selection process.
Data cleaning
In the second phase, data is “scrubbed” or cleaned. This cleaning is necessary because data tends to
have missing values, outliers, and inconsistencies in operational databases.
Impute missing values
When values are missing, they are either intentionally ignored or an imputed value must be determined
upon integration. Imputed values are based on the most probable value. Sometimes missing values
reflect reality and thus should be null.
The formula or process used to impute values needs to be predetermined, logical, documented, and
repeatable. The system should also track the fact that a missing value was assigned an imputed value.
This transparency for assigning imputed values is critical to data integrity and to relying on information
being generated.
Reduce noise in data
Noisy data — outliers and errors — have to be corrected as well. Again, an analyst or BI expert would need
to determine the most acceptable values to replace, or smooth out, outliers and errors.
Eliminate inconsistencies
Using domain knowledge, or the assistance of an expert, inconsistencies have to be fixed.
Inconsistencies include unusual values in for a variable (column, field).
Data transformation
Data transformation then takes the clean data and transforms it into a more usable form for data mining
or BI.
© 2019 Association of International Certified Professional Accountants. All rights reserved. 2-8