Page 309 - Understanding Machine Learning

P. 309

23.3 Compressed Sensing 291

Then, with probability of at least 1 − δ over a choice of a random matrix W ∈ R n,d
such that each element of W is independently distributed according to N(0,1/n),we
have

Wx
sup
− 1 <
.

x∈S x
Proof. It sufﬁces to prove the lemma for all x ∈ S with x = 1. We can write x =U I α
s
where α ∈ R , α 2 = 1, and U I is the matrix whose columns are {U i : i ∈ I}.Using
s
Lemma 23.11 we know that there exists a set Q of size |Q|≤ (12/
) such that
sup min α − v ≤ (
/4).
v∈Q
α: α =1
But since U is orthogonal we also have that
sup min U I α −U I v ≤ (
/4).
α: α =1 v∈Q
Applying Lemma 23.4 on the set {U I v : v ∈ Q} we obtain that for n satisfying the
condition given in the lemma, the following holds with probability of at least 1 − δ:

WU I v
sup
2 − 1
≤
/2,
v∈Q
U I v
This also implies that

WU I v
sup
− 1 ≤
/2.

v∈Q U I v
Let a be the smallest number such that
Wx
∀x ∈ S, ≤ 1 + a.
x
Clearly a < ∞. Our goal is to show that a ≤
. This follows from the fact that for any
x ∈ S of unit norm there exists v ∈ Q such that x −U I v ≤
/4 and therefore
Wx ≤√WU I v + W(x −U I v) ≤ 1 +
/2 + (1 + a)
/4.
Thus,
Wx
∀x ∈ S, ≤ 1 +
/2 + (1 + a)
/4 .
x
But the deﬁnition of a implies that

/2 +
/4
a ≤
/2 + (1 + a)
/4 ⇒ a ≤ ≤
.
1 −
/4
This proves that for all x ∈ S we have Wx − 1 ≤
. The other side follows from this
x
as well since
Wx ≥√WU I v − W(x −U I v) ≥ 1 −
/2 − (1 +
)
/4 ≥ 1 −
.

The preceding lemma tells us that for x ∈ S of unit norm we have
(1 −
) ≤√Wx ≤ (1 +
),

304 305 306 307 308 309 310 311 312 313 314