Page 308 - Understanding Machine Learning
P. 308
Dimensionality Reduction
290
Proof of Theorem 23.9
To prove the theorem we follow an approach due to (Baraniuk, Davenport, DeVore
& Wakin 2008). The idea is to combine the Johnson-Lindenstrauss (JL) lemma with
a simple covering argument.
We start with a covering property of the unit ball.
d
d 3
Lemma 23.11 Let
∈ (0,1). There exists a finite set Q ⊂ R of size |Q|≤ such
that
sup min x − v ≤
.
x: x ≤1 v∈Q
Proof. Let k be an integer and let
i
d
Q ={x ∈ R : ∀ j ∈ [d],∃i ∈{−k,−k + 1,...,k} s.t. x j = }.
k
d
Clearly, |Q |= (2k +1) .Weshall set Q = Q ∩ B 2 (1), where B 2 (1) is the unit 2 ball
d
of R . Since the points in Q are distributed evenly on the unit ∞ ball, the size of Q
is the size of Q times the ratio between the volumes of the unit 2 and ∞ balls. The
d
volume of the ∞ ball is 2 and the volume of B 2 (1) is
π d/2
.
(1 + d/2)
For simplicity, assume that d is even and therefore
d/2
d/2
(1 + d/2) = (d/2)! ≥ ,
e
where in the last inequality we used Stirling’s approximation. Overall we obtained
that
d
2
|Q|≤ (2k + 1) (π/e) d/2 (d/2) −d/2 −d . (23.10)
Now lets us specify k. For each x ∈ B 2 (1) let v ∈ Q be the vector whose ith element
is sign(x i ) |x i |k /k. Then, for each element we have that |x i − v i |≤ 1/k and thus
√
d
x − v ≤ .
k
√
To ensure that the right-hand side will be at most
we shall set k = d/
. Plugging
this value into Equation (23.10) we conclude that
√ d d
d
|Q|≤ (3 d/(2
)) (π/e) d/2 (d/2) −d/2 = 3 π ≤ 3 .
2e
Let x be a vector that can be written as x = Uα with U being some orthonormal
matrix and α 0 ≤ s. Combining the earlier covering property and the JL lemma
(Lemma 23.4) enables us to show that a random W will not distort any such x.
Lemma 23.12 Let U be an orthonormal d × d matrix and let I ⊂ [d] be a set of
indices of size |I|= s.Let S be the span of {U i : i ∈ I},where U i is the ith column of
U.Let δ ∈ (0,1),
∈ (0,1),and n ∈ N such that
log(2/δ) + s log(12/
)
n ≥ 24 .
2