Page 302 - Understanding Machine Learning
P. 302

Dimensionality Reduction
           284

                 due to Johnson and Lindenstrauss, showing that random projections do not distort
                 Euclidean distances too much.
                                                d
                    Let x 1 ,x 2 be two vectors in R .A matrix W does not distort too much the
                 distance between x 1 and x 2 if the ratio

                                                Wx 1 − Wx 2
                                                  x 1 − x 2
                 is close to 1. In other words, the distances between x 1 and x 2 before and after the
                 transformation are almost the same. To show that  Wx 1 − Wx 2   is not too far away
                 from  x 1 − x 2   it suffices to show that W does not distort the norm of the difference
                 vector x = x 1 − x 2 . Therefore, from now on we focus on the ratio   Wx  .
                                                                           x
                    We start with analyzing the distortion caused by applying a random projection
                 to a single vector.
                                           d
                 Lemma 23.3. Fix some x ∈ R .Let W ∈ R n,d  be a random matrix such that each W i, j
                 is an independent normal random variable. Then, for every 
 ∈ (0,3) we have

                                      
 0  √      0 2
                                      
 (1/ n)Wx  0    
            2
                                      
 0
                                   P  
             − 1
 >
   ≤ 2e −
 n/6 .

                                            x  2

                                                                      2
                 Proof. Without loss of generality we can assume that  x  = 1. Therefore, an
                 equivalent inequality is

                                                                      2
                                                  2                 −
 n/6
                                 P (1 − 
)n ≤√Wx  ≤ (1 + 
)n ≥ 1 − 2e    .
                 Let w i be the ith row of W. The random variable  w i ,x  is a weighted sum of
                 d independent normal random variables and therefore it is normally distributed
                                                2      2
                 with zero mean and variance   x = x  = 1. Therefore, the random variable
                                              j  j
                      2     n         2      2
                  Wx  =        ( w i ,x ) has a χ distribution. The claim now follows directly from
                            i=1              n
                                                    2
                 a measure concentration property of χ random variables stated in Lemma B.12
                 given in Section B.7.
                    The Johnson-Lindenstrauss lemma follows from this using a simple union bound
                 argument.

                 Lemma 23.4 (Johnson-Lindenstrauss Lemma). Let Q be a finite set of vectors in
                   d
                 R .Let δ ∈ (0,1) and n be an integer such that

                                                6log(2|Q|/δ)
                                           
 =               ≤ 3.
                                                      n
                 Then, with probability of at least 1 − δ over a choice of a random matrix W ∈ R n,d
                 such that each element of W is distributed normally with zero mean and variance of
                 1/n we have

                                                     2

                                                
  Wx
                                            sup
     2  − 1
 <
.
                                            x∈Q  
  x
   297   298   299   300   301   302   303   304   305   306   307