Page 136 - Data Science Algorithms in a Week
P. 136
Clustering into K Clusters
This time the algorithm separated from the religious books book The Koran into a green
cluster. This is because in fact the word god is the 5th most frequent word in The Koran.
The clustering here happens to divide the books according to the writing style they were
written with. Clustering into 4 clusters separates one book that has a relatively high
frequency of the word money from the red cluster of non-religious books into a separate
cluster. Let us look at the clustering into the 5 clusters.
Output for 5 clusters:
$ python k-means_clustering.py word_frequencies_money_god_scaled.csv 5 last
The total number of steps: 2
The history of the algorithm:
Step number 0: point_groups = [((0.0, 0.0406976744), 0), ((0.0,
0.0988372093), 0), ((0.125, 0.0581395349), 0), ((0.0, 0.1860465116), 0),
((0.0, 0.0348837209), 0), ((0.0, 0.1569767442), 0), ((0.0, 0.0348837209),
0), ((0.25, 0.3430232558), 4), ((0.25, 0.261627907), 4), ((0.125,
[ 124 ]