Page 324 - Understanding Machine Learning
P. 324
Generative Models
306
As an example, let us consider again the drug company which developed a new
drug. On the basis of past experience, the statisticians at the drug company believe
that whenever a drug has reached the level of clinic experiments on people, it is
likely to be effective. They model this prior belief by defining a density distribution
on θ such that
0.8 if θ> 0.5
P[θ] = (24.15)
0.2 if θ ≤ 0.5
As before, given a specific value of θ, it is assumed that the conditional probability,
P[X = x|θ], is known. In the drug company example, X takes values in {0,1} and
x
P[X = x|θ] = θ (1 − θ) 1−x .
Once the prior distribution over θ and the conditional distribution over X given
θ are defined, we again have complete knowledge of the distribution over X.This is
because we can write the probability over X as a marginal probability
P[X = x] = P[X = x,θ] = P[θ]P[X = x|θ],
θ θ
where the last equality follows from the definition of conditional probability. If θ
is continuous we replace P[θ] with the density function and the sum becomes an
integral:
F
P[X = x] = P[θ]P[X = x|θ]dθ.
θ
Seemingly, once we know P[X = x], a training set S = (x 1 ,...,x m ) tells us nothing
as we are already experts who know the distribution over a new point X. However,
the Bayesian view introduces dependency between S and X. Thisisbecause we
now refer to θ as a random variable. A new point X and the previous points in S are
independent only conditioned on θ. This is different from the frequentist philosophy
in which θ is a parameter that we might not know, but since it is just a parameter of
the distribution, a new point X and previous points S are always independent.
In the Bayesian framework, since X and S are not independent anymore, what
we would like to calculate is the probability of X given S, which by the chain rule
canbe writtenas follows:
P[X = x|S] = P[X = x|θ, S]P[θ|S] = P[X = x|θ]P[θ|S].
θ θ
The second inequality follows from the assumption that X and S are independent
when we condition on θ.Using the Bayes rule we have
P[S|θ]P[θ]
P[θ|S] = ,
P[S]
and together with the assumption that points are independent conditioned on θ,we
can write
m
P[S|θ]P[θ] 1
P[θ|S] = = P[X = x i |θ]P[θ].
P[S] P[S]
i=1