Page 260 - Deep Learning
P. 260
Error Correction: The Specialization Theory 243
The explanation for the high transfer from standard to modified counting
makes a counterintuitive prediction: that transfer in the opposite direction,
from a more constrained to a less constrained task, should be even higher. If
the strategy does not need to be more constrained, it might not require any
modifications at all. This is indeed the case. Transfer from either of the mod-
ified counting tasks to the standard task is even higher than in the opposite
direction. In one case, switching from targeted to standard counting, no rule
revisions are required, so transfer is 100% in that case.
Finally, moving from a task that is constrained one way to a different task
that is constrained in some other way should be the hardest transfer task of
all. This is indeed the case, as Table 7.5 shows. The transfer from ordered to
targeted counting is only 8%. If the rules have been constrained in a differ-
ent way than required, then the model needs to back up further in the rule
genealogies and there will be much work required to re-specialize the rules
and hence a small transfer effect. In general, the constraint-based specializa-
tion theory predicts that the magnitude of a transfer effect is a function of the
number of learning events needed to revise the prior (intermediate) rules to
fit the transfer task. The theory predicts large transfer effects when a strat-
egy only needs to be further specialized, lesser effects when it needs to be
re-specialized.
The fact that the constraint-based specialization mechanism predicts
asymmetric transfer effects is noteworthy because it differentiates this hypoth-
esis from the identical rules hypothesis. Recall that the latter claims that rules
are re-used when the identical rule appears in the strategy for a training task
and the strategy for a transfer task. This formulation predicts that the magni-
tude of the transfer effect when moving from task X to task Y is directly pro-
portional to the overlap between the two rule sets; that is, to the number of
rules needed to perform X that are also required to perform Y. This measure is
necessarily symmetrical, so the identical rules hypothesis implausibly predicts
that the amount of transfer from task X to task Y is always and necessarily
equal to the amount of transfer from task Y to task X. This strong prediction is
falsified by empirical studies by Miriam Bassok and others. The constraint-
50
based transfer mechanism claims that symmetrical transfer effects are acci-
dents and that the principled regularity is that transfer effects are proportional
to the number of learning events (not the number of knowledge elements)
that are required to adapt the prior rules to the transfer task. Hence, effects are
large when the transfer task is consistent with but more constrained than the
training task; because “more constrained than” is asymmetric, so are trans-
fer effects. The empirical data in the literature are not sufficient to evaluate