Page 306 - Understanding Machine Learning

P. 306

Dimensionality Reduction
288
c
values of x (ties are broken arbitrarily). Let T =[d]\T 0 . Next, T 1 will be the s indices
0
c
corresponding to the s largest elements in absolute value of h T .Let T 0,1 = T 0 ∪ T 1
0
and T c = [d] \ T 0,1 . Next, T 2 will correspond to the s largest elements in absolute
0,1
value of h T . And, we will construct T 3 ,T 4 ,... in the same way.
c
0,1
To prove the theorem we ﬁrst need the following lemma, which shows that RIP
also implies approximate orthogonality.
Lemma 23.10 Let W be an (
,2s)-RIP matrix. Then, for any two disjoint sets I, J,
both of size at most s, and for any vector u we have that Wu I ,Wu J ≤
u I 2 u J 2 .
Proof. W.l.o.g. assume u I 2 = u J 2 = 1.

2 2
Wu I + Wu J −⊂Wu I − Wu J 2
2
Wu I ,Wu J = .
4
2
But, since |J ∪ I|≤ 2s we get from the RIP condition that Wu I + Wu J ≤ (1 +
2
2 2 2 2 2

)( u I + u J ) = 2(1 +
)and that −⊂Wu I − Wu J ≤−(1 −
)( u I + u J ) =
2 2 2 2 2
−2(1 −
), which concludes our proof.
We are now ready to prove the theorem. Clearly,
+ h T 2 ≤√h T 0,1 2 + h T 2 . (23.5)
c

c
h 2 = h T 0,1
0,1 0,1
To prove the theorem we will show the following two claims:
Claim 1: h T 2 ≤√h T 0 2 + 2s −1/2 x − x s 1 .
c

0,1
2ρ −1/2

Claim 2: h T 0,1 2 ≤ s x − x s 1 .
1−ρ
Combining these two claims with Equation (23.5) we get that
h 2 ≤√h T 0,1 2 + h T 2 ≤ 2 h T 0,1 2 + 2s −1/2 x − x s 1

c
0,1

2ρ −1/2
≤ 2 + 1 s x − x s 1
1−ρ
1 + ρ −1/2
= 2 s x − x s 1 ,
1 − ρ
and this will conclude our proof.
Proving Claim 1:
To prove this claim we do not use the RIP condition at all but only use the fact that

x minimizes the 1 norm. Take j > 1. For each i ∈ T j and i ∈ T j−1 we have that
|h i |≤ |h i |. Therefore, h T j ∞ ≤√h T j−1 1 /s. Thus,

h T j 2 ≤ s 1/2 h T j ∞ ≤ s −1/2 h T j−1 1 .

Summing this over j = 2,3,... and using the triangle inequality we obtain that

c
h T 2 ≤ h T j 2 ≤ s −1/2 h T 1 (23.6)

c
0,1 0
j≥2

301 302 303 304 305 306 307 308 309 310 311