Page 357 - Understanding Machine Learning

P. 357

27.2 From Covering to Rademacher Complexity via Chaining 339

We can now write
1
∗
R(A) = E σ,a
m
M
1 (M) (k) (k−1)
∗
= E σ,a − b + σ,b − b
m
k=1

M
1
(M) 1
∗
≤ E σ a − b + E sup σ,a .
m m
k=1 a∈ ˆ B k
√ (M) −M c −M
∗
Since σ = m and a − b ≤ c2 , the ﬁrst summand is at most √ 2 .
m
Additionally, by Massart lemma,

2
1 −k 2log(N(c2 −k , A) ) −k log(N(c2 −k , A))
E sup σ,a ≤ 3c2 = 6c2 .
m m m
a∈ ˆ B k
Therefore,
M
c2 −M 6c −k
R(A) ≤ √ + 2 log(N(c2 −k , A)).
m m
k=1
As a corollary we obtain the following:
Lemma 27.5. Assume that there are α,β > 0 such that for any k ≥ 1 we have

log(N(c2 −k , A)) ≤ α + βk.
Then,
6c
R(A) ≤ (α + 2β).
m
Proof. The bound follows from Lemma 27.4 by taking M →∞ and noting that
−k −k
∞ 2 = 1and ∞ k2 = 2.
k=1 k=1
m
Example 27.2. Consider a set A which lies in a d dimensional subspace of R and
√ d
2c d
such that c = max a∈A a . We have shown that N(r, A) ≤ . Therefore, for
r
any k,

√
log(N(c2 −k , A)) ≤ d log 2 k+1 d

√ √
≤ d log(2 d) + kd

√ √
≤ d log(2 d) + dk.
Hence Lemma 27.5 yields

6c √ √ c d log(d)
R(A) ≤ d log(2 d) + 2 d = O .
m m

352 353 354 355 356 357 358 359 360 361 362