Page 133 - Data Science Algorithms in a Week

P. 133

Clustering into K Clusters

10 0.125 0.4011627907

11 0.125 1
12 0.625 0.0058139535
13 1 0
14 0.5 0.0058139535

15 0.375 0.0174418605
16 0.5 0.0174418605
17 0.75 0.0174418605

Now that we have rescaled data, let us apply k-means clustering algorithm trying dividing
the data into a different number of the clusters.

Input:

source_code/5/document_clustering/word_frequencies_money_god_scaled.csv
0,0.0406976744
0,0.0988372093
0.125,0.0581395349
0,0.1860465116
0,0.0348837209
0,0.1569767442
0,0.0348837209
0.25,0.3430232558
0.25,0.261627907
0.125,0.4011627907
0.125,1
0.625,0.0058139535
1,0
0.5,0.0058139535
0.375,0.0174418605
0.5,0.0174418605
0.75,0.0174418605
Output for 2 clusters:

$ python k-means_clustering.py
document_clustering/word_frequencies_money_god_scaled.csv 2 last
The total number of steps: 3
The history of the algorithm:
Step number 0: point_groups = [((0.0, 0.0406976744), 0), ((0.0,
0.0988372093), 0), ((0.125, 0.0581395349), 0), ((0.0, 0.1860465116), 0),
((0.0, 0.0348837209), 0), ((0.0, 0.1569767442), 0), ((0.0, 0.0348837209),
0), ((0.25, 0.3430232558), 0), ((0.25, 0.261627907), 0), ((0.125,

[ 121 ]

128 129 130 131 132 133 134 135 136 137 138