Page 78 - CITP Review
P. 78
Architecture and components
Other essential components of a DW system are to retrieve data; analyze data; extract, transform, and
load (ETL) data; and to provide for adequate management of a data dictionary. DW also includes
business intelligence tools to make effectual use of the data.
Data structure
There are basically two kinds of structure for data in a DW: dimensional data or normalized data.
The normalized data approach — created by Bill Inmon and also called snowflake — normalizes data for a
DW. Tables are grouped together by subject areas — for example, customers, products, employees,
purchases — resulting in dozens of tables linked by the foreign or primary keys. The addition of data from
operational systems to a snowflake DW is fairly straightforward.
The disadvantages are that it can be difficult for users
to join data from different sources to obtain more meaningful information, and
to access the information without thorough knowledge of the data sources and the DW data
structure.
In the dimensional data approach — created by Ralph Kimball and also called star — transactional data is
designed and defined as facts (generally, numeric data) or dimensions (reference information that adds
meaning to the facts). For example, a sales transaction has quantities sold and dollar amounts (that is,
facts); it also includes the date of sale, region in which it was sold, the sales representative who made the
sale, the category of inventory or service, and a variety of other relevant data that describes the numeric
data (that is, dimensions).
Generally, the dimensional approach is considered the easier of the two to use and for users to grasp
conceptually. Data retrieval tends to be more efficient in dimensional data, partly because it has been
denormalized, reducing the number of files needed to extract a set of data. Business users like the
dimension approach because of the ability to easily “slice and dice” the data into sales by region, sales by
time period, sales by sales rep, sales by sales rep and time frame, and so on. In addition, use of
dimensions generally allows for a more efficient management of GL accounts. ERP systems are moving
toward the dimensional approach for data analysis and reporting.
The disadvantages of dimensional data are that
loading the data from different systems and databases is complicated to get it into dimensional form,
and
it is difficult to modify the DW structure if the entity changes the way it does business processes or
captures data.
Also, there is a tendency for larger entities to struggle with the “one way” dimensions that have to be
defined, for example, everyone must refer to and use city, state, and zip code as CSZ, and it must be one
field, not three.
These data structures are not mutually exclusive. Star data structures are sometimes subjected to some
normalization. There are also other approaches.
© 2019 Association of International Certified Professional Accountants. All rights reserved. 2-32