Page 28 - Understanding Machine Learning
P. 28

Introduction
           10
                                       m
                 introduce the notation D to denote the probability over Z m  induced by sampling
                 (z 1 ,...,z m ) where each point z i is sampled from D independently of the other points.
                    In general, we have made an effort to avoid asymptotic notation. However, we
                 occasionally use it to clarify the main results. In particular, given f : R → R + and
                  g : R → R + we write f = O(g) if there exist x 0 ,α ∈ R + such that for all x > x 0 we
                 have f (x) ≤ αg(x). We write f = o(g) if for every α> 0 there exists x 0 such that for
                 all x > x 0 we have f (x) ≤ αg(x). We write f =  (g) if there exist x 0 ,α ∈ R + such that
                 for all x > x 0 we have f (x) ≥ αg(x). The notation f = ω(g) is defined analogously.
                 The notation f =  (g)meansthat f = O(g)and g = O( f ). Finally, the notation
                                                                            k
                  f = O(g) means that there exists k ∈ N such that f (x) = O(g(x)log (g(x))).
                      ˜
                    The inner product between vectors x and w is denoted by  x,w . Whenever we
                 do not specify the vector space we assume that it is the d-dimensional Euclidean
                                         d

                                            x
                 space and then  x,w =   i=1 i w i . The Euclidean (or   2 ) norm of a vector w is
                        √
                  w  2 =   w,w . We omit the subscript from the   2 norm when it is clear from the

                                                               p
                                                                 1/p
                 context. We also use other   p norms,  w  p =  |w i |  , and in particular  w  1 =
                                                           i

                    |w i | and  w  ∞ = max i |w i |.
                    i
                    We use the notation min x∈C f (x) to denote the minimum value of the set
                 { f (x): x ∈ C}. To be mathematically more precise, we should use inf x∈C f (x) when-
                 ever the minimum is not achievable. However, in the context of this book the
                 distinction between infimum and minimum is often of little interest. Hence, to sim-
                 plify the presentation, we sometimes use the min notation even when inf is more
                 adequate. An analogous remark applies to max versus sup.
   23   24   25   26   27   28   29   30   31   32   33