Page 10 - Data Science Algorithms in a Week
P. 10
Playing chess - analysis with decision tree 65
Going shopping - dealing with data inconsistency 69
Summary 70
Problems 71
Chapter 4: Random Forest 75
Overview of random forest algorithm 76
Overview of random forest construction 76
Swim preference - analysis with random forest 77
Random forest construction 78
Construction of random decision tree number 0 78
Construction of random decision tree number 1 80
Classification with random forest 83
Implementation of random forest algorithm 83
Playing chess example 86
Random forest construction 88
Construction of a random decision tree number 0: 88
Construction of a random decision tree number 1, 2, 3 92
Going shopping - overcoming data inconsistency with randomness and
measuring the level of confidence 94
Summary 96
Problems 97
Chapter 5: Clustering into K Clusters 102
Household incomes - clustering into k clusters 102
K-means clustering algorithm 103
Picking the initial k-centroids 104
Computing a centroid of a given cluster 104
k-means clustering algorithm on household income example 104
Gender classification - clustering to classify 105
Implementation of the k-means clustering algorithm 109
Input data from gender classification 112
Program output for gender classification data 112
House ownership – choosing the number of clusters 113
Document clustering – understanding the number of clusters k in a
semantic context 119
Summary 126
Problems 126
Chapter 6: Regression 135
Fahrenheit and Celsius conversion - linear regression on perfect data 136
Weight prediction from height - linear regression on real-world data 139
[ ]