Page 386 - Using MIS
P. 386

354       Chapter 9  Business Intelligence Systems

                                       Consider the first problem: too many attributes. Suppose we want to know the factors that
                                    influence how customers respond to a promotion. If we combine internal customer data with
                                    purchased customer data, we will have more than a hundred different attributes to consider.
                                    How do we select among them? In Drew and Addison’s case, they just ignored the columns they
                                    didn’t need. But in more sophisticated data mining analyses, too many attributes can be prob-
                                    lematic. Because of a phenomenon called the curse of dimensionality, the more attributes there
                                    are, the easier it is to build a model that fits the sample data but that is worthless as a predictor.
                                    There are other good reasons for reducing the number of attributes, and one of the major activi-
                                    ties in data mining concerns efficient and effective ways of selecting attributes.
                                       The second way to have an excess of data is to have too many data points—too many rows
                                    of data. Suppose we want to analyze clickstream data on CNN.com. How many clicks does that
                                    site receive per month? Millions upon millions! In order to meaningfully analyze such data we
                                    need to reduce the amount of data. One good solution to this problem is statistical sampling.
                                    Organizations should not be reluctant to sample data in such situations.
                                    Data Warehouses Versus Data Marts

                                    To understand the difference between data warehouses and data marts, think of a data ware-
                                    house as a distributor in a supply chain. The data warehouse takes data from the data manufac-
                                    turers (operational systems and other sources), cleans and processes the data, and locates the
                                    data on the shelves, so to speak, of the data warehouse. The data analysts who work with a data
                                    warehouse are experts at data management, data cleaning, data transformation, data relation-
                                    ships, and the like. However, they are not usually experts in a given business function.
                                       A data mart is a data collection, smaller than the data warehouse, that addresses the needs
                                    of a particular department or functional area of the business. If the data warehouse is the dis-
                                    tributor in a supply chain, then a data mart is like a retail store in a supply chain. Users in the
                                    data mart obtain data that pertain to a particular business function from the data warehouse.
                                    Such users do not have the data management expertise that data warehouse employees have,
                                    but they are knowledgeable analysts for a given business function.
                                       Figure 9-15 illustrates these relationships. In this example, the data warehouse takes data
                                    from the data producers and distributes the data to three data marts. One data mart is used to
                                    analyze clickstream data for the purpose of designing Web pages. A second analyzes store sales
                                    data and determines which products tend to be purchased together. This information is used to





                                      Data          Data            Web            BI tools         Web page
                                    Warehouse    Warehouse          Log       for Web clickstream   design features
                                    Metadata      Database          Data           analysis
                                                                        Web Sales Data Mart

                                                                       Store Sales Data Mart
                                  Data Producers  Warehouse         Store       management          Market-basket
                                                                                   BI tools
                                             Data
                                                                                                    analysis for sales
                                                                                  for store
                                                                    Sales
                                                                                                    training
                                                                    Data
                                             DBMS
                                                                                                    Inventory
                                                                  Inventory        BI tools         layout
                                                                   History       for inventory      for optimal
                                                                    Data        management          item picking
        Figure 9-15
        Data Mart Examples                                              Inventory Data Mart
   381   382   383   384   385   386   387   388   389   390   391