Page 289 - Understanding Machine Learning

P. 289

22. 3 Spectra l C lustering 271

(C 1 ,...,C k ). Hence,
k k
( t−1) 2 ( t−1) 2
x − µ i ≤ x − µ i . (22.5)
i=1 (t) i=1 (t−1)
x∈C x∈C

i i
Using Equation (22.3) we have that the right-hand side of Equation (22.5) equals
( t−1) ( t−1)
G(C ,...,C ). Combining this with Equation (22.4) and Equation (22.5), we
1 k
( t) ( t) ( t−1) ( t−1)

obtain that G(C ,...,C ) ≤ G(C ,...,C ), which concludes our proof.
1 k 1 k
While the preceding lemma tells us that the k-means objective is monotonically
nonincreasing, there is no guarantee on the number of iterations the k-means algo-
rithm needs in order to reach convergence. Furthermore, there is no nontrivial lower
bound on the gap between the value of the k-means objective of the algorithm’s
output and the minimum possible value of that objective function. In fact, k-means
might converge to a point which is not even a local minimum (see Exercise 22.2).
To improve the results of k-means it is often recommended to repeat the procedure
several times with different randomly chosen initial centroids (e.g., we can choose
the initial centroids to be random points from the data).

22.3 SPECTRAL CLUSTERING

Often, a convenient way to represent the relationships between points in a data set
X ={x 1 ,...,x m } is by a similarity graph; each vertex represents a data point x i ,and
every two vertices are connected by an edge whose weight is their similarity, W i, j =
2
2
s(x i ,x j ), where W ∈ R m,m . For example, we can set W i, j = exp( − d(x i ,x j ) /σ ),
where d(·,·) is a distance function and σ is a parameter. The clustering problem can
now be formulated as follows: We want to ﬁnd a partition of the graph such that the
edges between different groups have low weights and the edges within a group have
high weights.
In the clustering objectives described previously, the focus was on one side of
our intuitive deﬁnition of clustering – making sure that points in the same cluster
are similar. We now present objectives that focus on the other requirement – points
separated into different clusters should be nonsimilar.

22.3.1 Graph Cut
Given a graph represented by a similarity matrix W, the simplest and most direct
way to construct a partition of the graph is to solve the mincut problem, which
chooses a partition C 1 ,...,C k that minimizes the objective

cut(C 1 ,...,C k ) = W r,s .
i=1 r∈C i ,s /∈C i
For k = 2, the mincut problem can be solved efﬁciently. However, in practice it
often does not lead to satisfactory partitions. The problem is that in many cases, the
solution of mincut simply separates one individual vertex from the rest of the graph.

284 285 286 287 288 289 290 291 292 293 294