Page 356 - Understanding Machine Learning
P. 356

Covering Numbers
           338

                 27.1.1 Properties
                 The following lemma is immediate from the definition.
                                                                       m
                                           m
                 Lemma 27.2. For any A ⊂ R ,scalar c > 0, and vector a 0 ∈ R , we have
                                   ∀r > 0, N(r,{ca + a 0 : a ∈ A}) ≤ N(cr, A).

                    Next, we derive a contraction principle.
                 Lemma 27.3. For each i ∈ [m],let φ i : R → R be a ρ-Lipschitz function; namely, for
                                                                  m
                 all α,β ∈ R we have |φ i (α) − φ i (β)|≤ ρ |α − β|.For a ∈ R let φ(a) denote the vector
                 (φ 1 (a 1 ),...,φ m (a m )).Let φ ◦ A ={φ(a): a ∈ A}. Then,
                                           N(ρ r,φ ◦ A) ≤ N(r, A).




                 Proof. Define B = φ ◦ A.Let A be an r-cover of A and define B = φ ◦ A . Then, for


                 all a ∈ A there exists a ∈ A with  a − a  ≤ r. So,


                                      2                    2  2           2     2
                          φ(a) − φ(a )  =   (φ i (a i ) − φ i (a )) ≤ ρ  (a i − a ) ≤ (ρr) .
                                                                        i
                                                       i
                                          i                     i

                 Hence, B is an (ρ r)-cover of B.
                 27.2 FROM COVERING TO RADEMACHER COMPLEXITY VIA CHAINING
                 The following lemma bounds the Rademacher complexity of A basedonthe cov-
                 ering numbers N(r, A). This technique is called Chaining and is attributed to
                 Dudley.
                 Lemma 27.4. Let c = min ¯ a max a∈A  a − ¯ a . Then, for any integer M > 0,
                                                   M
                                        c2 −M   6c     −k
                                 R(A) ≤ √     +       2    log(N(c2 −k , A)).
                                           m    m
                                                   k=1
                 Proof. Let ¯ a be a minimizer of the objective function given in the definition of c.
                 On the basis of Lemma 26.6, we can analyze the Rademacher complexity assuming
                 that ¯ a = 0.
                    Consider the set B 0 ={0} and note that it is a c-cover of A.Let B 1 ,..., B M
                                                                   −k
                                                                                      ∗
                 be sets such that each B k corresponds to a minimal (c2 )-cover of A.Let a =
                 argmax     σ,a  (where if there is more than one maximizer, choose one in an arbi-
                        a∈A
                 trary way, and if a maximizer does not exist, choose a such that  σ,a   is close
                                                                                ∗
                                                                  ∗
                 enough to the supremum). Note that a is a function of σ. For each k,let b (k)  be the
                                                   ∗
                                    ∗
                 nearest neighbor of a in B k (hence b (k)  is also a function of σ). Using the triangle
                 inequality,
                                                                                  −k
                        b (k)  − b (k−1)  ≤  b (k)  − a  +  a − b (k−1)  ≤ c(2 −k  + 2 −(k−1) ) = 3c 2 .
                                            ∗
                                                  ∗
                 For each k define the set
                                                                        −k
                                ˆ
                                B k ={(a − a ): a ∈ B k ,a ∈ B k−1 , a − a  ≤ 3c2 }.
   351   352   353   354   355   356   357   358   359   360   361