Page 111 - ASBIRES-2017_Preceedings
P. 111

HADOOP BASED GRAPH ANALYTICS AND DATA ANALYTICS TOOLS ON MASSIVE OPEN
                                                           ONLINE COURSES


                        processors,  assigns  the  input  key  value   When Analyzing dataset it is based on the
                        K1 that each processor would work on,          Harvard/MIT MOOC’s System, it contains
                        and provides that processor with all the       user id, number of days activated, number
                        input  data  associated  with  that  key       of  videos  watched,  number  of  chapters
                        value.                                         covered, results they have obtained, user’s
                      2. Run  the  user-provided  Map  ()  code  –     education level ,countries, gender and etc.
                        Map () is run exactly once for each K1         3.2 Data Analysis
                        key  value,  generating  output  organized
                        by key values K2.                                     Python was used with QT designer
                                                                       to develop an application to use with data
                      3. "Shuffle" the Map output to the Reduce
                        processors  –  the  MapReduce  system          mining  algorithms  to  analyze  data.  Scikit
                        designates  Reduce  processors,  assigns       learn and NLTK (natural language toolkit)
                        the K2 key value each processor should         was used to process raw text data.
                        work  on,  and  provides  that  processor
                        with  all  the  Map-generated  data                   Those are python related built data
                        associated with that key value.                analyzing  packages.  Several  classification
                      4. Run the user-provided Reduce () code –        algorithms  were  used  such  as  Logistic
                        Reduce  ()  is  run  exactly  once  for  each   Regression,     MNB_classfier        and
                        K2 key value produced by the Map step.         MultinomialNB  etc.  Those  data  mining
                      5. Produce  the  final  output  –  the           tools  has  been  used  to  analyze  MOOC’s
                        MapReduce  system  collects  all  the          data efficiently.
                        Reduce  output,  and  sorts  it  by  K2  to
                        produce the final outcome.






























                                    Figure 2: Results by educational level 7 number of day activated



                                                                    101
   106   107   108   109   110   111   112   113   114   115   116