Page 111 - ASBIRES-2017_Preceedings
P. 111
HADOOP BASED GRAPH ANALYTICS AND DATA ANALYTICS TOOLS ON MASSIVE OPEN
ONLINE COURSES
processors, assigns the input key value When Analyzing dataset it is based on the
K1 that each processor would work on, Harvard/MIT MOOC’s System, it contains
and provides that processor with all the user id, number of days activated, number
input data associated with that key of videos watched, number of chapters
value. covered, results they have obtained, user’s
2. Run the user-provided Map () code – education level ,countries, gender and etc.
Map () is run exactly once for each K1 3.2 Data Analysis
key value, generating output organized
by key values K2. Python was used with QT designer
to develop an application to use with data
3. "Shuffle" the Map output to the Reduce
processors – the MapReduce system mining algorithms to analyze data. Scikit
designates Reduce processors, assigns learn and NLTK (natural language toolkit)
the K2 key value each processor should was used to process raw text data.
work on, and provides that processor
with all the Map-generated data Those are python related built data
associated with that key value. analyzing packages. Several classification
4. Run the user-provided Reduce () code – algorithms were used such as Logistic
Reduce () is run exactly once for each Regression, MNB_classfier and
K2 key value produced by the Map step. MultinomialNB etc. Those data mining
5. Produce the final output – the tools has been used to analyze MOOC’s
MapReduce system collects all the data efficiently.
Reduce output, and sorts it by K2 to
produce the final outcome.
Figure 2: Results by educational level 7 number of day activated
101