Page 362 - Understanding Machine Learning
P. 362

Proof of the Fundamental Theorem of Learning Theory
           344
                                      1
                                    ≥      (P − [y]1 [A(y) =+] + P − [y]1 [A(y) =−] )
                                      2
                                       y∈N  +
                                        1
                                     +       (P + [y]1 [A(y) =+] + P + [y]1 [A(y) =−] )
                                        2
                                         y∈N −
                                      1            1
                                    =       P − [y] +   P + [y].
                                      2            2
                                       y∈N +        y∈N  −

                 Next note that    + P − [y] =   − P + [y], and both values are the probability that
                                y∈N           y∈N
                 a Binomial (m,(1 − 
)/2) random variable will have value greater than m/2. Using
                 Lemma B.11, this probability is lower bounded by
                        1                                 1
                                            2
                                                   2
                                                                               2
                           1 −  1 − exp( − m
 /(1 − 
 ))  ≥  1 −  1 − exp( − 2m
 ) ,
                        2                                 2
                                                 2
                 where we used the assumption that 
 ≤ 1/2. It follows that if m ≤ 0.5log(1/(4δ))/
 2
                 then there exists b such that
                                                          (h b ) = 
]
                                         P[L D b  (A(y)) − L D b

                                             1           √
                                           ≥    1 −   1 −  4δ ≥ δ,
                                             2
                 where the last inequality follows by standard algebraic manipulations. This con-
                 cludes our proof.


                 28.2.2 Showing That m(
,1/8) ≥ 8d/
   2
                                                      √
                                                                              8d
                 We shall now prove that for every 
< 1/(8 2) we have that m(
,δ) ≥  2  .
                                                 √
                    Let ρ = 8
 and note that ρ ∈ (0,1/ 2). We will construct a family of distributions
                 as follows. First, let C ={c 1 ,...,c d } be a set of d instances which are shattered by H.
                                                      d
                 Second, for each vector (b 1 ,...,b d ) ∈{±1} , define a distribution D b such that

                                                  1  1+yb i ρ
                                                  d  ·  2    if ∃i : x = c i
                                    D b ({(x, y)}) =
                                                  0          otherwise.
                 That is, to sample an example according to D b , we first sample an element c i ∈ C
                 uniformly at random, and then set the label to be b i with probability (1 + ρ)/2or
                 −b i with probability (1 − ρ)/2.
                    It is easy to verify that the Bayes optimal predictor for D b is the hypothesis h ∈ H
                 such that h(c i ) = b i for all i ∈ [d], and its error is  1−ρ  . In addition, for any other
                                                               2
                 function f : X →{±1}, it is easy to verify that
                                1 + ρ  |{i ∈ [d]: f (c i )  = b i }|  1 − ρ  |{i ∈ [d]: f (c i ) = b i }|
                          ( f ) =    ·                   +      ·                  .
                       L D b
                                  2            d            2             d
                 Therefore,
                                                        |{i ∈ [d]: f (c i )  = b i }|
                                                (h) = ρ ·                 .         (28.2)
                                L D b  ( f ) − min L D b
                                         h∈H                    d
   357   358   359   360   361   362   363   364   365   366   367