Page 414 - Using MIS
P. 414
382 Chapter 9 Business Intelligence Systems
a. Display on Startup b. After MSN.com and Gmail
c. Five Sites Visited Yield 27 Third Parties d. Sites Connected to doubleclick
Figure 9-30
Third-Party Cookie Growth
Source: © Mozilla
Lightbeam will highlight it in the data column on the right. As independent processors and then, possibly, move to a second
you can see in Figure 9-30d, after visiting seven sites, double- phase of analysis where they do it again. Hadoop, the open-
click was connected to a total of 16 other sites, only seven of source program that you learned about in Q6, is a favorite for
which can be sites I visited. So, doubleclick is connecting to this process.
sites I don’t even know about and on my computer. Examine (See the collaboration exercise on pages 380–381 for a con-
the connection column on the right. I visited Msn, Amazon, tinuation of the discussion: third-party cookies—problem? Or
Mynorthwest, Wsj, but who are Bluekai and Rubiconproject? I opportunity?)
never heard of them until I saw this display. They, apparently,
have heard of me, however! Questions
Third-party cookies generate incredible volumes of log
data. For example, suppose a company, such as doubleclick, 9-16. Using your own words, explain how third-party cookies
shows 100 ads to a given computer in a day. If it is showing ads are created.
to 10 million computers (possible), that is a total of 1 billion log 9-17. Suppose you are an ad-serving company, and you
entries per day, or 365 billion a year. Truly this is BigData. maintain a log of cookie data for ads you serve to Web
Storage is essentially free, but how can they possibly pro- pages for a particular vendor (say Amazon).
cess all that data? How do they parse the log to find entries a. How can you use this data to determine which are
just for your computer? How do they integrate data from dif- the best ads?
ferent cookies on the same IP address? How do they analyze b. How can you use this data to determine which are
those entries to determine which ads you clicked on? How do the best ad formats?
they then characterize differences in ads to determine which c. How could you use records of past ads and ad clicks
characteristics matter most to you? The answer, as you learned to determine which ads to send to a given IP address?
in Q6, is to use parallel processing. Using a MapReduce al- d. How could you use this data to determine how well
gorithm, they distribute the work to thousands of processors the technique you used in your answer to question c
that work in parallel. They then aggregate the results of these was working?