Page 252 - Understanding Machine Learning
P. 252

Neural  Networks
           234

                 20.4  THE  SAMPLE  COMPLEXITY  OF  NEURAL  NETWORKS
                 Next we discuss the sample complexity of learning the class H V ,E,σ  . Recall that the

                 fundamental theorem of learning tells us that the sample complexity of learning a
                 hypothesis class of binary classifiers depends on its VC dimension. Therefore, we
                 focus on calculating the VC dimension of hypothesis classes of the form H V,E,σ  ,

                 where the output layer of the graph contains a single neuron.
                    We start with the sign activation function, namely, with H V ,E,sign . What is the VC

                 dimension of this class? Intuitively, since we learn |E| parameters, the VC dimen-

                 sion should be order of |E|. This is indeed the case, as formalized by the following

                 theorem.
                 Theorem 20.6.  The VC dimension of H V ,E,sign  is O(|E|log(|E|)).



                 Proof.  To simplify the notation throughout the proof, let us denote the hypothesis
                 class by H. Recall the definition of the growth function, τ H (m), from Section 6.5.1.
                 This function measures max C⊂ X :|C|=m |H C |, where  H C is the restriction of H to func-





                 tions from C to {0,1}. We can naturally extend the definition for a set of functions

                 from X to some finite set Y, by letting H C be the restriction of H to functions from




                 C to Y, and keeping the definition of τ H (m) intact.
                    Our neural network is defined by a layered graph. Let V 0 ,..., V T be the layers


                 of the graph. Fix some t ∈ [ T]. By assigning different weights on the edges between


                  V t−1  and V t , we obtain different functions from R |V t−1 |  → {±1} |V t |  . Let H ( t)  be the


                 class of all possible such mappings from R |V t−1 |  → {±1} |V t | . Then, H can be written


                                        ( T )    (1)


                 as a composition,  H = H  ◦ ... ◦ H  . In Exercise 20.4 we show that the growth
                 function of a composition of hypothesis classes is bounded by the products of the
                 growth functions of the individual classes. Therefore,
                                                     T

                                            τ H (m) ≤  τ  (t) (m).
                                                        H
                                                    t=1
                                  ( t)                                         ( t)   ( t,1)
                 In addition, each H  can be written as a product of function classes, H  = H  ×



                        ( t,|V t |)      ( t, j)
                 ··· × H    , where each H   is all functions from layer t − 1 to {±1} that the  jth

                 neuron of layer t can implement. In Exercise 20.3 we bound product classes, and
                 this yields
                                                     |V t |

                                           τ  (t)(m) ≤  τ  (t,i) (m).
                                            H           H
                                                     i=1
                 Let d t,i be the number of edges that are headed to the ith neuron of layer t.
                 Since the neuron is a homogenous halfspace hypothesis and the VC dimension of
                 homogenous halfspaces is the dimension of their input, we have by Sauer’s lemma
                 that
                                                        d t,i
                                        τ  (t,i) (m) ≤  em  ≤ (em) d t,i  .
                                         H          d t,i
                 Overall, we obtained that
                                                       d
                                        τ H (m) ≤ (em)  t,i t,i  = (em) |E| .
   247   248   249   250   251   252   253   254   255   256   257