Page 414 - Using MIS
P. 414

382       Chapter 9  Business Intelligence Systems
















       a. Display on Startup                                 b. After MSN.com and Gmail


















       c. Five Sites Visited Yield 27 Third Parties          d. Sites Connected to doubleclick
        Figure 9-30
        Third-Party Cookie Growth
        Source: © Mozilla



        Lightbeam will highlight it in the data column on the right. As   independent processors and then, possibly, move to a second
        you can see in Figure 9-30d, after visiting seven sites, double-  phase of analysis where they do it again. Hadoop, the open-
        click was connected to a total of 16 other sites, only seven of   source program that you learned about in Q6, is a favorite for
        which can be sites I visited. So, doubleclick is connecting to   this process.
        sites I don’t even know about and on my computer. Examine   (See the collaboration exercise on pages 380–381 for a con-
        the connection column on the right. I visited Msn, Amazon,   tinuation of the discussion: third-party cookies—problem? Or
        Mynorthwest, Wsj, but who are Bluekai and Rubiconproject? I   opportunity?)
        never heard of them until I saw this display. They, apparently,
        have heard of me, however!                            Questions
           Third-party cookies generate incredible volumes of log
        data. For example, suppose a company, such as doubleclick,     9-16.  Using your own words, explain how third-party cookies
        shows 100 ads to a given computer in a day. If it is showing ads   are created.
        to 10 million computers (possible), that is a total of 1 billion log     9-17.  Suppose you are an ad-serving company, and you
        entries per day, or 365 billion a year. Truly this is BigData.  maintain a log of cookie data for ads you serve to Web
           Storage is essentially free, but how can they possibly pro-  pages for a particular vendor (say Amazon).
        cess all that data? How do they parse the log to find entries   a.  How can you use this data to determine which are
        just for your computer? How do they integrate data from dif-   the best ads?
        ferent cookies on the same IP address? How do they analyze   b.  How can you use this data to determine which are
        those entries to determine which ads you clicked on? How do    the best ad formats?
        they then characterize differences in ads to determine which   c.  How could you use records of past ads and ad clicks
        characteristics matter most to you? The answer, as you learned   to determine which ads to send to a given IP address?
        in Q6, is to use parallel processing. Using a MapReduce al-  d.  How could you use this data to determine how well
        gorithm, they distribute the work to thousands of processors   the technique you used in your answer to question c
        that work in parallel. They then aggregate the results of these   was working?
   409   410   411   412   413   414   415   416   417   418   419