Page 78 - CITP Review
P. 78

Architecture and components
            Other essential components of a DW system are to retrieve data; analyze data; extract, transform, and
            load (ETL) data; and to provide for adequate management of a data dictionary. DW also includes
            business intelligence tools to make effectual use of the data.


            Data structure

            There are basically two kinds of structure for data in a DW: dimensional data or normalized data.
            The normalized data approach — created by Bill Inmon and also called snowflake — normalizes data for a
            DW. Tables are grouped together by subject areas — for example, customers, products, employees,
            purchases — resulting in dozens of tables linked by the foreign or primary keys. The addition of data from
            operational systems to a snowflake DW is fairly straightforward.

            The disadvantages are that it can be difficult for users

              to join data from different sources to obtain more meaningful information, and
              to access the information without thorough knowledge of the data sources and the DW data
               structure.
            In the dimensional data approach — created by Ralph Kimball and also called star — transactional data is
            designed and defined as facts (generally, numeric data) or dimensions (reference information that adds
            meaning to the facts). For example, a sales transaction has quantities sold and dollar amounts (that is,
            facts); it also includes the date of sale, region in which it was sold, the sales representative who made the
            sale, the category of inventory or service, and a variety of other relevant data that describes the numeric
            data (that is, dimensions).

            Generally, the dimensional approach is considered the easier of the two to use and for users to grasp
            conceptually. Data retrieval tends to be more efficient in dimensional data, partly because it has been
            denormalized, reducing the number of files needed to extract a set of data. Business users like the
            dimension approach because of the ability to easily “slice and dice” the data into sales by region, sales by
            time period, sales by sales rep, sales by sales rep and time frame, and so on. In addition, use of
            dimensions generally allows for a more efficient management of GL accounts. ERP systems are moving
            toward the dimensional approach for data analysis and reporting.

            The disadvantages of dimensional data are that

              loading the data from different systems and databases is complicated to get it into dimensional form,
               and
              it is difficult to modify the DW structure if the entity changes the way it does business processes or
               captures data.
            Also, there is a tendency for larger entities to struggle with the “one way” dimensions that have to be
            defined, for example, everyone must refer to and use city, state, and zip code as CSZ, and it must be one
            field, not three.


            These data structures are not mutually exclusive. Star data structures are sometimes subjected to some
            normalization. There are also other approaches.


            © 2019 Association of International Certified Professional Accountants. All rights reserved.    2-32
   73   74   75   76   77   78   79   80   81   82   83