Page 302 - Understanding Machine Learning
P. 302
Dimensionality Reduction
284
due to Johnson and Lindenstrauss, showing that random projections do not distort
Euclidean distances too much.
d
Let x 1 ,x 2 be two vectors in R .A matrix W does not distort too much the
distance between x 1 and x 2 if the ratio
Wx 1 − Wx 2
x 1 − x 2
is close to 1. In other words, the distances between x 1 and x 2 before and after the
transformation are almost the same. To show that Wx 1 − Wx 2 is not too far away
from x 1 − x 2 it suffices to show that W does not distort the norm of the difference
vector x = x 1 − x 2 . Therefore, from now on we focus on the ratio Wx .
x
We start with analyzing the distortion caused by applying a random projection
to a single vector.
d
Lemma 23.3. Fix some x ∈ R .Let W ∈ R n,d be a random matrix such that each W i, j
is an independent normal random variable. Then, for every
∈ (0,3) we have
0 √ 0 2
(1/ n)Wx 0
2
0
P
− 1
>
≤ 2e −
n/6 .
x 2
2
Proof. Without loss of generality we can assume that x = 1. Therefore, an
equivalent inequality is
2
2 −
n/6
P (1 −
)n ≤√Wx ≤ (1 +
)n ≥ 1 − 2e .
Let w i be the ith row of W. The random variable w i ,x is a weighted sum of
d independent normal random variables and therefore it is normally distributed
2 2
with zero mean and variance x = x = 1. Therefore, the random variable
j j
2 n 2 2
Wx = ( w i ,x ) has a χ distribution. The claim now follows directly from
i=1 n
2
a measure concentration property of χ random variables stated in Lemma B.12
given in Section B.7.
The Johnson-Lindenstrauss lemma follows from this using a simple union bound
argument.
Lemma 23.4 (Johnson-Lindenstrauss Lemma). Let Q be a finite set of vectors in
d
R .Let δ ∈ (0,1) and n be an integer such that
6log(2|Q|/δ)
= ≤ 3.
n
Then, with probability of at least 1 − δ over a choice of a random matrix W ∈ R n,d
such that each element of W is distributed normally with zero mean and variance of
1/n we have
2
Wx
sup
2 − 1
<
.
x∈Q
x