Page 244 - Deep Learning
P. 244
Error Correction: The Specialization Theory 227
R G S → A,
,
:
and that R applies in a sequence of situations. Suppose further that A turns
out to be the right thing to do in some of those situations but not in others.
Its execution history then contains a set of situations, {Sp} = Sp , Sp , Sp ,
1
2
3
…, in which action A generated positive outcomes, and other set of situa-
tions, {Sn} = Sn , Sn , Sn , …, in which the same action generated negative
2
1
3
outcomes.
The natural inference is that there is some factor that distinguishes these
two sets of situations. The condition of the rule needs to be augmented by some
feature that discriminates between {Sp} and {Sn}. The question is how to iden-
tify the relevant feature or features. The latter should be true of all members of
{Sp} but not of any member of {Sn}. It can be identified by first extracting the
list of all features that are shared by all the members of {Sp}. The second step
is to delete from that list every feature that is also true of at least one member
in {Sn}. The features that remain differentiate the situations with positive out-
comes from those with negative outcomes. Call that set of features { f }. Finally,
add those features to S to produce a new rule R′:
′ RG S: , ,& f {} → A
This rule recommends action A only in those situations that exhibit the dif-
ferentiating features. There is no guarantee that the new rule will never gen-
erate errors, but there is some probability that it will avoid the type of error
produced in the past.
There are multiple difficulties with this model of learning from error.
If there is more than one discriminating feature, this mechanism provides
no systematic way of choosing among them. The options are to add all dis-
criminating features to the condition side of the responsible rule, as in the
schematic example earlier, or to create multiple new rules, each including
one of the discriminating features. Given that two sets of situations might
differ with respect to hundreds of features, these are unattractive options. In
addition, discrimination, computed this way, makes implausible demands
on memory. The learner’s brain has to encode into long-term memory every
application of every rule. There is no way of knowing in advance which fea-
tures of a situation will turn out to be crucial for future discriminations, so
each memory trace has to be quite detailed. Like all other inductive pro-
cesses, discrimination lacks a criterion for deciding how much evidence is
enough. How many situations of either type have to be in memory before