Page 213 - Deep Learning

P. 213

196 Adaptation

Stage 2: Mastery
The second stage of practice begins when the learner completes the task for the
first time and it lasts until he can reliably perform the task correctly. during
this stage, the main theoretical problem is how the learner can improve his
incomplete and possibly incorrect version of the strategy-to-be-learned. The
most important source of information during this middle stage consists of the
outcomes of the actions generated by the current version of the target skill. to
act tentatively in an unfamiliar environment is to ask questions of that environ-
ment. pushing a button is to ask, what does this button do? and the machine’s
response is the answer; poking a piece of coral with the tip of a diver’s knife
is to ask, what is this?; if it scampers off, the information gained is that this a
well-camouflaged fish. in general, the relation between action and outcome
contains information. Since the emergence of cybernetics in the 1940s, infor-
mation to the effect that the learner’s action was appropriate, correct or useful
is called positive feedback. The source of such information can be an instructor
or a peer (good job; that’s the right answer; etc.), but it can also be the material
task environment (oh, I see; this button does turn on the red light). information
to the effect that the learner’s action was erroneous, incorrect, inappropriate or
unhelpful in some respect is called negative feedback. it, too, can come from a
tutor (perhaps you should double-check that answer) or originate in the mate-
rial environment, as when a novice driver fishtails when trying to drive on ice
for the first time.
different mechanisms are required to learn from these two types of
information. in their 1966 review of learning theories, Ernest R. Hilgard and
Gordon H. Bower stated this point succinctly, using Right to refer to positive
feedback and Wrong to refer to negative feedback:

There is a logical difference between responding in the intelligent direc-
tion to Right and Wrong. The intelligent response to Right is to do again
what was last done. This makes possible immediate rehearsal; the task is
clear. The intelligent response to Wrong is to do something different, but
what to do is less clear. it is necessary both to remember what not to do
and to form some sort of hypothesis as to what to do. 58

The main conundrum regarding learning from positive feedback is what
there is to learn: to elicit positive feedback, the learner must have done the
right thing. But if he already knows what to do, then how does the positive
feedback help? it almost certainly has some role to play, because giving pos-
59
itive feedback is a common move by experienced tutors. one possibility is
that many actions taken during skill acquisition are tentative. The learner

208 209 210 211 212 213 214 215 216 217 218