Page 75 - CITP Review
P. 75
These biases are not easily fixed because the deep learning process is ongoing; you may not realize the
downstream impacts of your data and choices until much later. In this respect, artificial biases and
human biases are similar and usually unknown at first. Clarity evolves over time as analysis occurs.
Data integration
Earlier in this chapter we discussed data mapping. Using the understanding of the data from the
mapping process, selected data can be integrated into a system or database, or combined with other
data to form more meaningful results. Records and variables that come from various data sources are
integrated into the target database or combined with the use of a data visualization tool. The following
sections cover aspects of this process.
Extract, transform, and load
Because DW data is a coherent collage of data from a variety of databases, there is necessarily a
sophisticated and structured approach to gathering and importing data into the DW database. That
process is referred to as ETL — extract, transform, and load — described as follows:
Extract refers to extracting data from the various data sources outside of the DW.
Transform refers to the transformation of that data into usable form for the purposes of the DW, for
example, data analytics, data mining, or other business intelligence purposes.
Load refers to loading the transformed data into the DW database.
Usually the DW database is cumulative data over several years; however, the timing of the updates varies
— it could be updated frequently (daily) or infrequently (monthly). Eventually, some data will need to be
overwritten or deleted in the DW database.
The numerous aspects of ETL that lead to high risk for the ETL include the following:
Anytime data is transferred from one system to another, there is a relatively high IR.
In the transform step, data is being manipulated.
Rules are applied to data in the load step.
There is a substantial need for controls to mitigate these risks.
The CITP can assist in identifying those risks, developing effective controls and procedures to mitigate
that risk, and analyzing the resulting database for data quality and integrity.
Enterprise application integration (EAI)
Integration frameworks are important to identify and manage the flow of data within an enterprise’s IT
environment. As a part of planning, management should consider how to adopt the framework to cover
data flow between organization applications.
If allowed to follow a natural ad hoc evolution, an entity will purchase or build various systems and
applications that essentially operate in virtual silos, that is, the systems will not automatically and
effectively interface or integrate with all of the other systems and applications.
© 2019 Association of International Certified Professional Accountants. All rights reserved. 2-29