Page 55 - CITP Review
P. 55

Normalize data
                                                                              3
            The first transformation is to normalize the data. By normalizing data , all variables (columns, fields) will
            be treated equally by data analyses.



            Discretize and aggregate data
            It may also be best to discretize certain data; discretize means to convert numeric values to categorical
            values (high, medium, low). Data might also need to be aggregated; instead of using 50 states, for
            instance they might be aggregated into just four regions. Data that is subject to either of these activities
            are because of the intended use of that data in the data warehouse, usually to simplify reporting and
            usage of the data.
            It is also fairly common for data to be kept in detail, then lightly summarized in a separate file, and highly
            summarized in yet a third file. That way, different users can view the data in different levels of aggregation
            without waiting for the system to do the aggregation, and it allows for quick and efficient drill down from
            the highly summarized view or report.


            Construct new attributes

            Sometimes it is efficient for new attributes (variables, columns, fields) to be created to increase
            information content and reduce the complexity of relationships in the data, so that relationships not
            directly identified by the data have new attributes created to provide values to describe those
            relationships.



            Data reduction
            Although one goal of data warehouse (DW) data is to accumulate a vast array of data, too much data can
            be unmanageable, and make the understanding and use of the database too complicated; therefore, DW
            developers often reduce the volume of data being loaded into the DW database.



            Reduce the number of variables
            DW developers may also reduce the number of fields or variables to a more manageable number.
            Domain experts and statistical results can help facilitate an effective number of variables.


            Reduce the number of records

            The same principle applies to the number of records being loaded. Too many records would decrease the
            speed of using the DW. Once again, domain experts or stats might be used to determine whether some
            of the records could be eliminated without losing significant information.


            3  Normalized data means to transform all values in a column (variable, field) to some number between 0 (equivalent
            to the lowest value) and 1 (equivalent to the highest value); the new value represents the relative position, as a
            percentage, of where that value falls along all possible values. The reason to normalize data is to prevent columns
            with large values from getting too much weight in data analyses (for example, purchases of fixed assets over
            purchases of office supplies).


            © 2019 Association of International Certified Professional Accountants. All rights reserved.    2-9
   50   51   52   53   54   55   56   57   58   59   60