Page 50 - Understanding Machine Learning

P. 50

Learning via Uniform Convergence
32

Proof. For every h ∈ H,

L D (h S ) ≤ L S (h S ) + ≤ L S (h) + ≤ L D (h) + + = L D (h) +
,
2 2 2 2
where the ﬁrst and third inequalities are due to the assumption that S is

-representative (Deﬁnition 4.1) and the second inequality holds since h S is an ERM
2
predictor.
The preceding lemma implies that to ensure that the ERM rule is an agnostic
PAC learner, it sufﬁces to show that with probability of at least 1 − δ over the ran-
dom choice of a training set, it will be an
-representative training set. The uniform
convergence condition formalizes this requirement.
Deﬁnition 4.3 (Uniform Convergence). We say that a hypothesis class H has the
uniform convergence property (w.r.t. a domain Z and a loss function ) if there exists
2
a function m UC :(0,1) → N such that for every
,δ ∈ (0,1) and for every probabil-
H
ity distribution D over Z,if S is asampleof m ≥ m UC (
,δ) examples drawn i.i.d.
H
according to D, then, with probability of at least 1 − δ, S is
-representative.
Similar to the deﬁnition of sample complexity for PAC learning, the function
m UC measures the (minimal) sample complexity of obtaining the uniform con-
H
vergence property, namely, how many examples we need to ensure that with
probability of at least 1 − δ the sample would be
-representative.
The term uniform here refers to having a ﬁxed sample size that works for all
members of H and over all possible probability distributions over the domain.
The following corollary follows directly from Lemma 4.2 and the deﬁnition of
uniform convergence.

Corollary 4.4. If a class H has the uniform convergence property with a function m UC
H
then the class is agnostically PAC learnable with the sample complexity m H (
,δ) ≤
m UC (
/2,δ). Furthermore, in that case, the ERM H paradigm is a successful agnostic
H
PAC learner for H.

4.2 FINITE CLASSES ARE AGNOSTIC PAC LEARNABLE
In view of Corollary 4.4, the claim that every ﬁnite hypothesis class is agnostic PAC
learnable will follow once we establish that uniform convergence holds for a ﬁnite
hypothesis class.
To show that uniform convergence holds we follow a two step argument, similar
to the derivation in Chapter 2. The ﬁrst step applies the union bound while the
second step employs a measure concentration inequality. We now explain these two
steps in detail.
Fix some
,δ. We need to ﬁnd a sample size m that guarantees that for any D,
with probability of at least 1 − δ of the choice of S = (z 1 ,...,z m ) sampled i.i.d. from
D we have that for all h ∈ H, |L S (h) − L D (h)|≤
.That is,
m
D ({S : ∀h ∈ H,|L S (h) − L D (h)|≤
}) ≥ 1 − δ.
Equivalently, we need to show that

m
D ({S : ∃h ∈ H,|L S (h) − L D (h)| >
}) <δ.

45 46 47 48 49 50 51 52 53 54 55