Page 28 - Understanding Machine Learning

P. 28

Introduction
10
m
introduce the notation D to denote the probability over Z m induced by sampling
(z 1 ,...,z m ) where each point z i is sampled from D independently of the other points.
In general, we have made an effort to avoid asymptotic notation. However, we
occasionally use it to clarify the main results. In particular, given f : R → R + and
g : R → R + we write f = O(g) if there exist x 0 ,α ∈ R + such that for all x > x 0 we
have f (x) ≤ αg(x). We write f = o(g) if for every α> 0 there exists x 0 such that for
all x > x 0 we have f (x) ≤ αg(x). We write f = (g) if there exist x 0 ,α ∈ R + such that
for all x > x 0 we have f (x) ≥ αg(x). The notation f = ω(g) is deﬁned analogously.
The notation f = (g)meansthat f = O(g)and g = O( f ). Finally, the notation
k
f = O(g) means that there exists k ∈ N such that f (x) = O(g(x)log (g(x))).
˜
The inner product between vectors x and w is denoted by x,w . Whenever we
do not specify the vector space we assume that it is the d-dimensional Euclidean
d

x
space and then x,w = i=1 i w i . The Euclidean (or 2 ) norm of a vector w is
√
w 2 = w,w . We omit the subscript from the 2 norm when it is clear from the

p
1/p
context. We also use other p norms, w p = |w i | , and in particular w 1 =
i

|w i | and w ∞ = max i |w i |.
i
We use the notation min x∈C f (x) to denote the minimum value of the set
{ f (x): x ∈ C}. To be mathematically more precise, we should use inf x∈C f (x) when-
ever the minimum is not achievable. However, in the context of this book the
distinction between inﬁmum and minimum is often of little interest. Hence, to sim-
plify the presentation, we sometimes use the min notation even when inf is more
adequate. An analogous remark applies to max versus sup.

23 24 25 26 27 28 29 30 31 32 33