Page 397 - Using MIS
P. 397

Q6  How Do Organizations Use BigData Applications?   365
                                       MapReduce

                                       Because BigData is huge, fast, and varied, it cannot be processed using traditional techniques.
                                       MapReduce is a technique for harnessing the power of thousands of computers working in par-
                                       allel. The basic idea is that the BigData collection is broken into pieces, and hundreds or thou-
                                       sands of independent processors search these pieces for something of interest. That process is
                                       referred to as the Map phase. In Figure 9-23, for example, a data set having the logs of Google
                                       searches is broken into pieces, and each independent processor is instructed to search for and
                                       count search keywords. Figure 9-23, of course, shows just a small portion of the data; here you
                                       can see a portion of the keywords that begin with H.
                                           As the processors finish, their results are combined in what is referred to as the Reduce
                                       phase. The result is a list of all the terms searched for on a given day and the count of each. The
                                       process is considerably more complex than described here, but this is the gist of the idea.
                                           By the way, you can visit Google Trends to see an application of MapReduce. There you can
                                       obtain a trend line of the number of searches for a particular term or terms. Figure 9-24 shows the
                                       search trend for the term Web 2.0. The vertical axis is scaled; a value of 1.0 represents the average
                                       number of searches over that time period. This particular trend line, by the way, supports the
                                       contention that the term Web 2.0 is fading from use. Go to www.google.com/trends and enter the
                                       terms Big Data, BigData, and Hadoop to see why learning about them is a better use of your time!

                                       Hadoop

                                                                                                     15
                                       Hadoop is an open source program supported by the Apache Foundation  that implements
                                       MapReduce on potentially thousands of computers. Hadoop could drive the process of find-
                                       ing and counting the Google search terms, but Google uses its own proprietary version of
                                       MapReduce to do so instead.
                                           Hadoop began as part of Cassandra, but the Apache Foundation split it off to become its own
                                       product. Hadoop is written in Java and originally ran on Linux. Recently, Microsoft announced a



                                                  Log
                                 Search log:      segments:  Map Phase                     Reduce Phase
                                  …
                                  Halon; Wolverine;                           …
                                  Abacus; Poodle; Fence;    Processor 1       Hadoop    14
                                  Acura; Healthcare;                          Healthcare  85
                                  Cassandra; Belltown;                        Hiccup    17
                                  Hadoop; Geranium;                           Hurricane   8      Keyword:Total Count:
                                  Stonework; Healthcare;                      …                 …
                                  Honda; Hadoop;                              …                 Hadoop       10,418
                                  Congress; Healthcare;                       Hadoop     3      Halon         4,788
                                  Frigate; Metric; Clamp;   Processor 2       Healthcare   2    Healthcare  12,487,318
                                  Dell; Salmon; Hadoop;                       Honda      1      Hiccup        7,435
                                  Picasso; Abba;                              …                 Honda       127,489
                                                                                                            237,654
                                                                                                Hotel
                                  …                 …           …                  …            Hurricane     2,799
                                                                              …
                                                                              Halon     11      …
                                                          Processor 9,555     Hotel    175
                                                              (+ or –)        Honda     87
            Figure 9-23                                                       Hurricane   53
            MapReduce Processing                                              …
            Summary


                                       15 A nonprofit corporation that supports open source software projects, originally those for the Apache Web
                                       server, but today for a large number of additional major software projects.
   392   393   394   395   396   397   398   399   400   401   402