Page 308 - Understanding Machine Learning
P. 308

Dimensionality Reduction
           290

                 Proof of Theorem 23.9
                 To prove the theorem we follow an approach due to (Baraniuk, Davenport, DeVore
                 & Wakin 2008). The idea is to combine the Johnson-Lindenstrauss (JL) lemma with
                 a simple covering argument.
                    We start with a covering property of the unit ball.
                                                                                    d
                                                                    d             3
                 Lemma 23.11 Let 
 ∈ (0,1). There exists a finite set Q ⊂ R of size |Q|≤  such

                 that
                                            sup  min  x − v ≤ 
.
                                           x: x ≤1  v∈Q
                 Proof. Let k be an integer and let
                                                                            i
                                      d
                             Q ={x ∈ R : ∀ j ∈ [d],∃i ∈{−k,−k + 1,...,k} s.t. x j = }.

                                                                            k
                                     d


                 Clearly, |Q |= (2k +1) .Weshall set Q = Q ∩ B 2 (1), where B 2 (1) is the unit   2 ball
                     d
                 of R . Since the points in Q are distributed evenly on the unit   ∞ ball, the size of Q

                 is the size of Q times the ratio between the volumes of the unit   2 and   ∞ balls. The

                                       d
                 volume of the   ∞ ball is 2 and the volume of B 2 (1) is
                                                   π d/2
                                                          .
                                                 (1 + d/2)
                 For simplicity, assume that d is even and therefore
                                                                  d/2
                                                             d/2
                                       (1 + d/2) = (d/2)! ≥         ,
                                                              e
                 where in the last inequality we used Stirling’s approximation. Overall we obtained
                 that
                                                 d
                                                                  2
                                     |Q|≤ (2k + 1) (π/e) d/2  (d/2) −d/2 −d .      (23.10)
                 Now lets us specify k. For each x ∈ B 2 (1) let v ∈ Q be the vector whose ith element
                 is sign(x i ) |x i |k /k. Then, for each element we have that |x i − v i |≤ 1/k and thus
                                                        √
                                                          d
                                                x − v ≤    .
                                                         k
                                                                           √
                 To ensure that the right-hand side will be at most 
 we shall set k =  d/
 . Plugging
                 this value into Equation (23.10) we conclude that
                                   √                                d        d
                                           d
                            |Q|≤ (3 d/(2
)) (π/e) d/2  (d/2) −d/2  =  3  π  ≤  3  .
                                                                
  2e
                    Let x be a vector that can be written as x = Uα with U being some orthonormal
                 matrix and  α  0 ≤ s. Combining the earlier covering property and the JL lemma
                 (Lemma 23.4) enables us to show that a random W will not distort any such x.

                 Lemma 23.12 Let U be an orthonormal d × d matrix and let I ⊂ [d] be a set of
                 indices of size |I|= s.Let S be the span of {U i : i ∈ I},where U i is the ith column of
                 U.Let δ ∈ (0,1), 
 ∈ (0,1),and n ∈ N such that
                                              log(2/δ) + s log(12/
)
                                        n ≥ 24                    .
                                                       
 2
   303   304   305   306   307   308   309   310   311   312   313