Page 259 - Understanding Machine Learning

P. 259

20.9 Exercises 241

Finally, show that a neural network can ﬁrst decide which box the input vector
belongs to, and then predict the averaged value of f at that box.
20.2 Prove Theorem 20.5.
n
Hint: For every f : {−1,1} →{−1,1} construct a 1-Lipschitz function g :
n
[− 1,1] → [− 1,1] such that if you can approximate g then you can express f .
20.3 Growth function of product: For i = 1,2, let F i be a set of functions from X to Y i .
Deﬁne H = F 1 × F 2 to be the Cartesian product class. That is, for every f 1 ∈ F 1
and f 2 ∈ F 2 , there exists h ∈ H such that h(x) = ( f 1 (x), f 2 (x)). Prove that τ H (m) ≤
(m).
τ F 1 (m)τ F 2
20.4 Growth function of composition: Let F 1 be a set of functions from X to Z and let
F 2 be a set of functions from Z to Y.Let H = F 2 ◦F 1 be the composition class. That
is, for every f 1 ∈ F 1 and f 2 ∈ F 2 , there exists h ∈ H such that h(x) = f 2 ( f 1 (x)). Prove
(m).
that τ H (m) ≤ τ F 2 (m)τ F 1
20.5 VC of sigmoidal networks: In this exercise we show that there is a graph (V , E)
such that the VC dimension of the class of neural networks over these graphs with
2
the sigmoid activation function is (|E| ). Note that for every
> 0, the sigmoid
x
activation function can approximate the threshold activation function, 1 [ i i ] ,up
to accuracy
. To simplify the presentation, throughout the exercise we assume
x
that we can exactly implement the activation function 1 [ i i >0] using a sigmoid
activation function.
Fix some n.
1. Construct a network, N 1 ,with O(n) weights, which implements a function from
n
n
R to {0,1} and satisﬁes the following property. For every x ∈{0,1} , if we feed
the network with the real number 0.x 1 x 2 ...x n , then the output of the network
will be x.
k
Hint: Denote α = 0.x 1 x 2 ...x n and observe that 10 α −0.5 is at least 0.5 if x k = 1
and is at most −0.3 if x k =−1.
2. Construct a network, N 2 ,with O(n) weights, which implements a function from
n
[n]to {0,1} such that N 2 (i) = e i for all i. That is, upon receiving the input i,the
network outputs the vector of all zeros except 1 at the i’th neuron.
(i) (i) (i)
3. Let α 1 ,...,α n be n real numbers such that every α i is of the form 0.a a ...a n ,
1 2
(i)
with a j ∈{0,1}. Construct a network, N 3 ,with O(n) weights, which implements
a function from [n]to R, and satisﬁes N 2 (i) = α i for every i ∈ [n].
(i)
4. Combine N 1 , N 3 to obtain a network that receives i ∈ [n] and output a .
(i)
5. Construct a network N 4 that receives (i, j) ∈ [n]× [n] and outputs a .
j
2
Hint: Observe that the AND function over {0,1} can be calculated using O(1)
weights.
6. Conclude that there is a graph with O(n) weights such that the VC dimension
2
of the resulting hypothesis class is n .
20.6 Prove Theorem 20.7.
Hint: The proof is similar to the hardness of learning intersections of halfspaces –
see Exercise 32 in Chapter 8.

254 255 256 257 258 259 260 261 262 263 264