Page 113 - ASBIRES-2017_Preceedings
P. 113

HADOOP BASED GRAPH ANALYTICS AND DATA ANALYTICS TOOLS ON MASSIVE OPEN
                                                           ONLINE COURSES

                                Table 2: Results by number of           5.1 Limitation of the System
                                    videos watched
                                                                               Highly memory consumption: There
                                                                       should be high memory (RAM) capacity to
                                            60%   training   70%        training   80%      training   run Hadoop and python. Hadoop runs on
                                                                       virtual machine. It almost needs 10GB of
                                                                       RAM to work properly.
                       MultinomialNB      0.97    0.93    0.93                      6 CONCLUSION

                       BernouliNB         96.67   96.67   96.67                A MOOC tool was developed which
                       SGDClassifier      6.65    95.32   96           helps  to  analyze  the  MOOC’s  data.  By
                                                                       Graph analysis, user can view and compare
                       SVC_classifier     96.77   71.81   95.65        different  kind  of  attributes  efficiently.

                       LinearSVC_clas     96.42   93.67   95.58        Further this system can be used efficiently
                       sifier                                          to  analyze  the  data  with  hadoop  and  hue.
                       MNB_classifier     96.65   93.25   92.89        Numbers of graph patterns were realized to
                       Logistic           96.68   95.93   95.06        identify  the  data  of  students  based  on
                       regression                                      region, subject and duration of the courses.
                                                                       Data mining technique was used to develop
                             Further  number  of  days  activated      a  machine  learning  model  to  predict  the
                      and  number  of  chapters  used  were            pass  rate  accuracy.  Logistic  regression
                      identified as the highly influenced attributes   model was given the highest accuracy with
                      to the above models.                             96.08%. Graph analytics and data analytics
                                                                       patterns  would  be  utilized  in  future  for
                                   5 DISCUSSION
                                                                       effective functioning of MOOC systems.
                             This  research  project  was  carried
                      out to analyze the data through graphs and                     REFERENCES
                      data  mining  techniques.  With  the  help  of    Gasevic,  D,  Kovanovic,  V,  Joksimovic,
                      this  tool,  new  users  can  analyze  their       S., & Siemens, G. (2014). A data analysis
                      probability  to  complete  the  courses.  In       of  the  MOOC  Research  Initiative.  The
                      addition  to  that  data  mining  techniques       International  Review  of  Research  in
                      were  used  to  develop  the  best  model  to      Operand Distributed Learning, 15(5).
                      identify  most  affected  attribute  for  the     Mridul,  M.,  Khajuria,  A.,  Dutta  S.,
                      results. Moreover the tool provide attractive      Kumar,  N.  (2014).  Analysis  of  Big  Data
                      and  user  friendly  interfaces  for  data
                      manipulation.                                      using Apache  Hadoop  and Map  Reduce.
                             There is no relevant tool to analyze        4(5).
                      MOOC’s data. This system is very efficient        Patel,  A  B,  Birla  M,  Nair,  U.  (6-8  Dec.
                      and  user-friendly  system  than  other  tools     2012).  Addressing  Big  Data  Problem
                      used by data mining systems like weka tool.        Using Hadoop and Map Reduce.
                      Those  tools  have  a  lot  of  data  analyzing    Phaneendra, S.V., & Reddy, E.M. (2013).
                      techniques.  However,  graph  analysis  is         Big  Data  -  solutions  for  RDBMS
                                                                                                     th
                      weak  on  those  systems.  Moreover,  those        problems  -  A  Survey.  12   IEEE/IFIP
                      tools  cannot  be  customized.  Nevertheless,      Network  Operations  &  Management
                      this system can be customized according to         Symposium (NOMS 2010), Osaka, Japan
                      the data set.                                      Apr 19- 23.
                                                                        Kyong,  H.,  &  Lee,  H.  (2011).  Parallel
                                                                         Data  Processing  With  Map  Reduce:  A
                                                                         Survey. SIGMOD Record, 40(4).






                                                                    103
   108   109   110   111   112   113   114   115   116   117   118