Page 244 - Deep Learning
P. 244

Error Correction: The Specialization Theory   227


                                   R G S →      A,
                                        ,
                                     :
            and that R applies in a sequence of situations. Suppose further that A turns
            out to be the right thing to do in some of those situations but not in others.
            Its execution history then contains a set of situations, {Sp} = Sp , Sp , Sp ,
                                                                    1
                                                                        2
                                                                           3
            …, in which action A generated positive outcomes, and other set of situa-
            tions, {Sn} = Sn , Sn , Sn , …, in which the same action generated negative
                             2
                         1
                                 3
            outcomes.
               The natural inference is that there is some factor that distinguishes these
            two sets of situations. The condition of the rule needs to be augmented by some
            feature that discriminates between {Sp} and {Sn}. The question is how to iden-
            tify the relevant feature or features. The latter should be true of all members of
            {Sp} but not of any member of {Sn}. It can be identified by first extracting the
            list of all features that are shared by all the members of {Sp}. The second step
            is to delete from that list every feature that is also true of at least one member
            in {Sn}. The features that remain differentiate the situations with positive out-
            comes from those with negative outcomes. Call that set of features { f  }. Finally,
            add those features to S to produce a new rule R′:


                                   ′ RG S:  ,  ,&  f {}  → A
            This rule recommends action A only in those situations that exhibit the dif-
            ferentiating features. There is no guarantee that the new rule will never gen-
            erate errors, but there is some probability that it will avoid the type of error
            produced in the past.
               There are multiple difficulties with this model of learning from error.
            If there is more than one discriminating feature, this mechanism provides
            no systematic way of choosing among them. The options are to add all dis-
            criminating features to the condition side of the responsible rule, as in the
            schematic example earlier, or to create multiple new rules, each including
            one of the discriminating features. Given that two sets of situations might
            differ with respect to hundreds of features, these are unattractive options. In
            addition, discrimination, computed this way, makes implausible demands
            on memory. The learner’s brain has to encode into long-term memory every
            application of every rule. There is no way of knowing in advance which fea-
            tures of a situation will turn out to be crucial for future discriminations, so
            each memory trace has to be quite detailed. Like all other inductive pro-
            cesses, discrimination lacks a criterion for deciding how much evidence is
            enough. How many situations of either type have to be in memory before
   239   240   241   242   243   244   245   246   247   248   249