Page 109 - Understanding Machine Learning

P. 109

9.1 Halfspaces 91

will describe the logistic regression approach, which can be implemented efﬁciently
even in the nonseparable case. We will study surrogate loss functions in more detail
later oninChapter 12.

9.1.1 Linear Programming for the Class of Halfspaces
Linear programs (LP) are problems that can be expressed as maximizing a linear
function subject to linear inequalities. That is,

max u,w
w∈R d
subject to Aw ≥ v
d
where w ∈ R is the vector of variables we wish to determine, A is an m × d matrix,
d
m
1
and v ∈ R ,u ∈ R are vectors. Linear programs can be solved efﬁciently, and
furthermore, there are publicly available implementations of LP solvers.
We will show that the ERM problem for halfspaces in the realizable case can be
expressed as a linear program. For simplicity, we assume the homogenous case. Let
m
S ={(x i , y i )} be a training set of size m. Since we assume the realizable case, an
i=1
ERM predictor should have zero errors on the training set. That is, we are looking
d
for some vector w ∈ R for which
sign( w,x i ) = y i , ∀i = 1,...,m.
Equivalently, we are looking for some vector w for which
y i w,x i > 0, ∀i = 1,...,m.

∗
Let w be a vector that satisﬁes this condition (it must exist since we assume real-
∗ w ∗
izability). Deﬁne γ = min i (y i w ,x i )and let ¯ w = . Therefore, for all i we
γ
have
1
∗
y i ¯ w,x i = y i w ,x i ≥ 1.
γ
We have thus shown that there exists a vector that satisﬁes
y i w,x i ≥ 1, ∀i = 1,...,m. (9.1)
And clearly, such a vector is an ERM predictor.
To ﬁnd a vector that satisﬁes Equation (9.1) we can rely on an LP solver as
follows. Set A to be the m × d matrix whose rows are the instances multiplied by y i .
That is, A i, j = y i x i, j ,where x i, j is the j’th element of the vector x i .Let v be the
m
vector (1,...,1) ∈ R . Then, Equation (9.1) can be rewritten as
Aw ≥ v.

The LP form requires a maximization objective, yet all the w that satisfy the
constraints are equal candidates as output hypotheses. Thus, we set a “dummy”
d
objective, u = (0,...,0) ∈ R .
1 Namely, in time polynomial in m, d, and in the representation size of real numbers.

104 105 106 107 108 109 110 111 112 113 114