Page 265 - Deep Learning
P. 265
248 Adaptation
54
that has received much attention in educational research. We began by defin-
ing the basic cognitive capabilities that are needed to do subtraction. These
include the ability to allocate visual attention as well as motor skills like writ-
ing and crossing out digits. The initial rules for subtraction were sufficient for
canonical subtraction problems in which the subtrahend digit is greater than
the minuend digit in each column, for example, 678 – 234 = ?. Regrouping
(popularly known as “borrowing“) was not necessary to solve the canonical
problems. We then taught the model to perform correctly on problems that
require regrouping as well. This two-step instructional sequence – canonical
problems followed by problems that require regrouping – corresponds to the
one observed in classroom teaching. 55
We tutored the subtraction model in two different methods for noncanon-
ical problems, the regrouping method taught in most American schools and the
augmenting method preferred in some European schools. The two procedures
differ primarily in how they handle noncanonical columns, either by decre-
menting the subtrahend in the column with the next higher place value or by
incrementing the minuend in that column. This difference was once thought
56
by mathematics educators to be of pedagogical importance. We tutored each
of these subtraction methods in two ways that we referred to as procedural
and conceptual. In procedural (rote) arithmetic, the learner sees an arithmetic
problem as a spatial arrangement of digits on a page. In this version, the con-
straints referred to that arrangement, for example, the answer should have a
single digit in each column. In contrast, a conceptual arithmeter thinks about
subtraction mathematically instead of typographically. The characters are sym-
bols for numbers. In this version, the model’s internal representation explicitly
enoded the place value of each digit, and the constraints encoded mathemat-
ical relations, such as if the value of a digits D in column N is D * 10 , then the
m
value of a digit E in column N+1 should be E * 10 m+1 . Mathematics educators
regard the conceptual approach to arithmetic as pedagogically superior.
The distinction between procedural (rote) versus conceptual learning
combined with the two different algorithms, regrouping and augmenting,
to define four different learning scenarios. We tutored HS until mastery in
each scenario with the same procedure we would use to tutor a student: We
watched the model solve a problem until it made an error. We interrupted
the model and typed in a constraint that we thought would allow the model
to correct that error. The constraint was added to the constraint base in the
model, and the model was restarted. This cycle continued until the model per-
formed correctly. Mastery was assessed by running the model on a test set of
66 subtraction problems that were not used during training.