Page 140 - Data Science Algorithms in a Week

P. 140

Clustering into K Clusters

Analysis:

1. a) (1/3)*(2+3+4)=3
b) (1/3)*(100$+400$+1000$)=500$
c) ((10+40+0)/3,(20+60+40)/3)=(50/3, 120/3)=(50/3, 40)
d) ((200$+300$+500$+250$)/4,(40km+60km+100km+200km)/4)
=(1250$/4,400km/4)=(312.5$,100km)
e)((1+0+10+4+5)/5,(2+0+20+8+0)/5,(4+3+5+2+1)/5)=(4,6,3)

2. a) We add a second coordinate and set it to 0 for all the features. This way
the distance between the features does not change and we can use the
clustering algorithm we implemented earlier in this chapter.

Input:

# source_code/5/problem5_2.csv
0,0
2,0
5,0
4,0
8,0
10,0
12,0
11,0

For 2 clusters:
$ python k-means_clustering.py problem5_2.csv 2 last
The total number of steps: 2
The history of the algorithm:
Step number 0: point_groups = [((0.0, 0.0), 0), ((2.0, 0.0),
0), ((5.0, 0.0), 0), ((4.0, 0.0), 0), ((8.0, 0.0), 1), ((10.0,
0.0), 1), ((12.0, 0.0), 1), ((11.0, 0.0), 1)]
centroids = [(0.0, 0.0), (12.0, 0.0)]
Step number 1: point_groups = [((0.0, 0.0), 0), ((2.0, 0.0),
0), ((5.0, 0.0), 0), ((4.0, 0.0), 0), ((8.0, 0.0), 1), ((10.0,
0.0), 1), ((12.0, 0.0), 1), ((11.0, 0.0), 1)]
centroids = [(2.75, 0.0), (10.25, 0.0)]
For 3 clusters:

$ python k-means_clustering.py problem5_2.csv 3 last
The total number of steps: 2
The history of the algorithm:
Step number 0: point_groups = [((0.0, 0.0), 0), ((2.0, 0.0),
0), ((5.0, 0.0), 2), ((4.0, 0.0), 2), ((8.0, 0.0), 2), ((10.0,

[ 128 ]

135 136 137 138 139 140 141 142 143 144 145