Page 259 - Understanding Machine Learning
P. 259

20.9 Exercises  241


                  Finally, show that a neural network can first decide which box the input vector
                  belongs to, and then predict the averaged value of f at that box.
              20.2 Prove Theorem 20.5.
                                          n
                  Hint: For every f : {−1,1} →{−1,1} construct a 1-Lipschitz function g :
                        n
                  [− 1,1] → [− 1,1] such that if you can approximate g then you can express f .
              20.3 Growth function of product: For i = 1,2, let F i be a set of functions from X to Y i .
                  Define H = F 1 × F 2 to be the Cartesian product class. That is, for every f 1 ∈ F 1
                  and f 2 ∈ F 2 , there exists h ∈ H such that h(x) = ( f 1 (x), f 2 (x)). Prove that τ H (m) ≤
                           (m).
                  τ F 1  (m)τ F 2
              20.4 Growth function of composition: Let F 1 be a set of functions from X to Z and let
                  F 2 be a set of functions from Z to Y.Let H = F 2 ◦F 1 be the composition class. That
                  is, for every f 1 ∈ F 1 and f 2 ∈ F 2 , there exists h ∈ H such that h(x) = f 2 ( f 1 (x)). Prove
                                     (m).
                  that τ H (m) ≤ τ F 2  (m)τ F 1
              20.5 VC of sigmoidal networks: In this exercise we show that there is a graph (V , E)
                  such that the VC dimension of the class of neural networks over these graphs with
                                                   2
                  the sigmoid activation function is  (|E| ). Note that for every 
> 0, the sigmoid
                                                                                x
                  activation function can approximate the threshold activation function, 1 [   i i ] ,up
                  to accuracy 
. To simplify the presentation, throughout the exercise we assume
                                                                    x
                  that we can exactly implement the activation function 1 [   i i >0] using a sigmoid
                  activation function.
                  Fix some n.
                  1. Construct a network, N 1 ,with O(n) weights, which implements a function from
                                                                           n
                             n
                     R to {0,1} and satisfies the following property. For every x ∈{0,1} , if we feed
                     the network with the real number 0.x 1 x 2 ...x n , then the output of the network
                     will be x.
                                                             k
                     Hint: Denote α = 0.x 1 x 2 ...x n and observe that 10 α −0.5 is at least 0.5 if x k = 1
                     and is at most −0.3 if x k =−1.
                  2. Construct a network, N 2 ,with O(n) weights, which implements a function from
                              n
                     [n]to {0,1} such that N 2 (i) = e i for all i. That is, upon receiving the input i,the
                     network outputs the vector of all zeros except 1 at the i’th neuron.
                                                                           (i) (i)  (i)
                  3. Let α 1 ,...,α n be n real numbers such that every α i is of the form 0.a a  ...a n ,
                                                                           1  2
                          (i)
                     with a j  ∈{0,1}. Construct a network, N 3 ,with O(n) weights, which implements
                     a function from [n]to R, and satisfies N 2 (i) = α i for every i ∈ [n].
                                                                              (i)
                  4. Combine N 1 , N 3 to obtain a network that receives i ∈ [n] and output a .
                                                                            (i)
                  5. Construct a network N 4 that receives (i, j) ∈ [n]× [n] and outputs a .
                                                                            j
                                                             2
                     Hint: Observe that the AND function over {0,1} can be calculated using O(1)
                     weights.
                  6. Conclude that there is a graph with O(n) weights such that the VC dimension
                                                  2
                     of the resulting hypothesis class is n .
              20.6 Prove Theorem 20.7.
                  Hint: The proof is similar to the hardness of learning intersections of halfspaces –
                  see Exercise 32 in Chapter 8.
   254   255   256   257   258   259   260   261   262   263   264