Page 37 - Understanding Machine Learning
P. 37
2.3 Empirical Risk Minimization with Inductive Bias 19
only happen if our sample is in the set of misleading samples, M. Formally, we have
shown that
{S| x : L (D, f ) (h S ) >
}≤ M.
Note that we can rewrite M as
M = {S| x : L S (h) = 0}. (2.5)
h∈H B
Hence,
m m m
D ({S| x : L (D, f ) (h S ) >
}) ≤ D (M) = D ( ∪ h∈H B {S| x : L S (h) = 0}). (2.6)
Next, we upper bound the right-hand side of the preceding equation using the
union bound – a basic property of probabilities.
Lemma 2.2 (Union Bound). For any two sets A, B and a distribution D we have
D(A ∪ B) ≤ D(A) + D(B).
Applying the union bound to the right-hand side of Equation (2.6) yields
m m
D ({S| x : L (D, f ) (h S ) >
}) ≤ D ({S| x : L S (h) = 0}). (2.7)
h∈H B
Next, let us bound each summand of the right-hand side of the preceding inequality.
Fix some “bad” hypothesis h ∈ H B . The event L S (h) = 0 is equivalent to the event
∀i,h(x i ) = f (x i ). Since the examples in the training set are sampled i.i.d. we get that
m
m
D ({S| x : L S (h) = 0}) = D ({S| x : ∀i,h(x i ) = f (x i )})
m
= D({x i : h(x i ) = f (x i )}). (2.8)
i=1
For each individual sampling of an element of the training set we have
D({x i : h(x i ) = y i }) = 1 − L (D, f ) (h) ≤ 1 −
,
where the last inequality follows from the fact that h ∈ H B . Combining the previous
equation with Equation (2.8) and using the inequality 1−
≤ e −
we obtain that for
every h ∈ H B ,
m
m
D ({S| x : L S (h) = 0}) ≤ (1 −
) ≤ e −
m . (2.9)
Combining this equation with Equation (2.7) we conclude that
m
D ({S| x : L (D, f ) (h S ) >
}) ≤|H B |e −
m ≤|H|e −
m .
A graphical illustration which explains how we used the union bound is given in
Figure 2.1.
Corollary 2.3. Let H be a finite hypothesis class. Let δ ∈ (0,1) and
> 0 and let m be
an integer that satisfies
log(|H|/δ)
m ≥ .