Page 314 - Understanding Machine Learning
P. 314

Generative Models
           296

                 using the drug. To do so, the drug company sampled a training set of m people and
                 gave them the drug. Let S = (x 1 ,...,x m ) denote the training set, where for each i,
                  x i = 1if the ith person survived and x i = 0 otherwise. We can model the underlying
                 distribution using a single parameter, θ ∈ [0,1], indicating the probability of survival.
                    We now would like to estimate the parameter θ on the basis of the training set S.
                 A natural idea is to use the average number of 1’s in S as an estimator. That is,
                                                      m
                                                    1
                                                ˆ
                                                θ =      x i .                      (24.1)
                                                    m
                                                      i=1
                 Clearly, E S [θ] = θ.That is, θ is an unbiased estimator of θ. Furthermore, since θ is
                                                                                      ˆ
                                          ˆ
                            ˆ
                 the average of m i.i.d. binary random variables we can use Hoeffding’s inequality to
                 get that with probability of at least 1 − δ over the choice of S we have that

                                                       log(2/δ)
                                             ˆ
                                            |θ − θ|≤          .                     (24.2)
                                                         2m
                    Another interpretation of ˆ θ is as the Maximum Likelihood Estimator,as we
                 formally explain now. We first write the probability of generating the sample S:
                                              m
                                                                  x         (1−x i )
                                                 x i
                           P[S = (x 1 ,...,x m )] =  θ (1 − θ) 1−x i  = θ  i i  (1 − θ)  i  .
                                             i=1
                 We define the log likelihood of S, given the parameter θ, as the log of the preceding
                 expression:


                      L(S;θ) = log P[S = (x 1 ,...,x m )] = log(θ)  x i + log(1 − θ)  (1 − x i ).
                                                            i               i
                 The maximum likelihood estimator is the parameter that maximizes the likelihood
                                             ˆ θ ∈ argmax L(S;θ).                   (24.3)
                                                   θ
                 Next, we show that in our case, Equation (24.1) is a maximum likelihood estimator.
                 To see this, we take the derivative of L(S;θ)with respect to θ and equate it to zero:

                                             i  x i  i  (1 − x i )
                                                −           = 0.
                                             θ       1 − θ
                 Solving the equation for θ we obtain the estimator given in Equation (24.1).

                 24.1.1 Maximum Likelihood Estimation for Continuous
                 Random Variables
                 Let X be a continuous random variable. Then, for most x ∈ R we have P[X = x] = 0
                 and therefore the definition of likelihood as given before is trivialized. To overcome
                 this technical problem we define the likelihood as log of the density of the probabil-
                 ity of X at x. That is, given an i.i.d. training set S = (x 1 ,...,x m ) sampled according
                 to a density distribution P θ we define the likelihood of S given θ as

                                                m           m

                                  L(S;θ) = log    P θ (x i ) =  log(P θ (x i )).
                                               i=1         i=1
   309   310   311   312   313   314   315   316   317   318   319