Page 57 - Understanding Machine Learning
P. 57
5.1 The No-Free-Lunch Theorem 39
{0,1} and every i we have
1
(h) =
L D i 1 [h(x) = f i (x)]
2m
x∈C
p
1
≥ 1 [h(v r ) = f i (v r )]
2m
r=1
p
1
≥ 1 [h(v r ) = f i (v r )] . (5.5)
2p
r=1
Hence,
T T p
1 i 1 1
(A(S )) ≥ 1
i
L D i j [A(S )(v r ) = f i (v r )]
T T 2p j
i=1 i=1 r=1
p T
1 1
= 1 [A(S )(v r ) = f i (v r )]
i
2p T j
r=1 i=1
T
1 1
≥ · min 1 [A(S )(v r ) = f i (v r )] . (5.6)
i
2 r∈[p] T j
i=1
Next, fix some r ∈ [p]. We can partition all the functions in f 1 ,..., f T into T/2dis-
joint pairs, where for a pair ( f i , f i ) we have that for every c ∈ C, f i (c) = f i (c)if and
i
i
only if c = v r . Since for such a pair we must have S = S , it follows that
j j
1 + 1 = 1,
i
i
[A(S )(v r ) = f i (v r )] [A(S )(v r ) = f (v r )]
j j i
which yields
T
1 1
1 i = .
T [A(S )(v r ) = f i (v r )] 2
j
i=1
Combining this with Equation (5.6), Equation (5.4), and Equation (5.3), we obtain
that Equation (5.1) holds, which concludes our proof.
5.1.1 No-Free-Lunch and Prior Knowledge
How does the No-Free-Lunch result relate to the need for prior knowledge? Let us
consider an ERM predictor over the hypothesis class H of all the functions f from
X to {0,1}. This class represents lack of prior knowledge: Every possible function
from the domain to the label set is considered a good candidate. According to the
No-Free-Lunch theorem, any algorithm that chooses its output from hypotheses in
H, and in particular the ERM predictor, will fail on some learning task. Therefore,
this class is not PAC learnable, as formalized in the following corollary:
Corollary 5.2. Let X be an infinite domain set and let H be the set of all functions
from X to {0,1}. Then, H is not PAC learnable.
Proof. Assume, by way of contradiction, that the class is learnable. Choose some
< 1/8and δ< 1/7. By the definition of PAC learnability, there must be some