Page 214 - Deep Learning
P. 214

The Growth of Competence                 197

            does not know the right step with any certainty but moves forward anyway.
            in some proportion of such situations, the step taken will, in fact, turn out
            to be appropriate, correct or useful. in those situations, positive feedback
            helps to reduce the uncertainty about that step. This type of effect can be
            modeled with strengthening, a learning mechanism that was formulated by
            Thorndike and constitutes the first half of his Law of Effect: When a rule
                                                         60
            generates a positive outcome, increase its strength.  The consequence is
            that the rule will win against its competitors more often, thus being executed
            more often. in Thorndike’s own words: “The one impulse, out of many acci-
            dental ones, which leads to pleasure, becomes strengthened and stamped
            in thereby, and more and more firmly associated with the [relevant] sense-
            impression. … [Learning curves] represent the wearing smooth of a path in
            the brain.”
               However,  positive  outcomes  invite  other,  more  generative  processes  as
            well. if the learner tentatively performed action A in situation S in pursuit of
            goal G, the outcome turned out to be positive and there is no prior rule that
            would have recommended that step, then it is reasonable to create the new
            rule

                                       GS ⇒    A,
                                         ,
            which will recommend the successful action in future encounters with S; this is
            sometimes called bottom-up rule generation.  one complication is that the sit-
                                                61
            uation S is history by the time the positive feedback arrives and will not recur,
            so the rule needs to specify situations like S, rather than S itself. Bottom-up
            learning requires a generalization process that can determine, given the exam-
            ple situation S, in which class of situations, {S}, the action A will produce a
                          62
            positive outcome.  it is not obvious how such a generalization process might
            work. if i see a movie by director X with story line Y and i like the movie, it
            makes more sense to conclude that i should see more movies by director X
            than to conclude that i should see more movies with story line Y. The right
            conclusion is obvious in each specific case, but it is less obvious how it can be
            computed by a general mechanism.
               one possible technique is to extract shared features across multiple sit-
            uations. For example, users of word-processing software learn to select text
            by double clicking on a paragraph, and also to select graphical elements by
            double clicking. it is plausible that the similarities between the corresponding
            rules result in a more general rule of the form if you want to select any object on
            a computer screen, double click it. This rule is more general than its parents and
   209   210   211   212   213   214   215   216   217   218   219