Page 343 - Understanding Machine Learning

P. 343

Rademacher Complexities

In Chapter 4 we have shown that uniform convergence is a sufﬁcient condition for
learnability. In this chapter we study the Rademacher complexity, which measures
the rate of uniform convergence. We will provide generalization bounds based on
this measure.

26.1 THE RADEMACHER COMPLEXITY
Recall the deﬁnition of an
-representative sample from Chapter 4, repeated here
for convenience.
Deﬁnition 26.1 (
-Representative Sample). A training set S is called

-representative (w.r.t. domain Z, hypothesis class H, loss function , and distri-
bution D)if
sup|L D (h) − L S (h)|≤
.
h∈H
We have shown that if S is an
/2 representative sample then the ERM rule is

-consistent, namely, L D (ERM H (S)) ≤ min h∈H L D (h) +
.
To simplify our notation, let us denote
def def
F = ◦ H ={z → (h,z): h ∈ H},
and given f ∈ F, we deﬁne
m
1
L D ( f ) = E [ f (z)], L S ( f ) = f (z i ).
z∼D m
i=1
We deﬁne the representativeness of S with respect to F as the largest gap between
the true error of a function f and its empirical error, namely,
def
Rep (F, S) = sup (L D ( f ) − L S ( f )). (26.1)
D
f ∈F
Now, suppose we would like to estimate the representativeness of S using the
sample S only. One simple idea is to split S into two disjoint sets, S = S 1 ∪ S 2 ;

325

338 339 340 341 342 343 344 345 346 347 348