Page 260 - Deep Learning
P. 260

Error Correction: The Specialization Theory   243

               The explanation for the high transfer from standard to modified counting
            makes a counterintuitive prediction: that transfer in the opposite direction,
            from a more constrained to a less constrained task, should be even higher. If
            the strategy does not need to be more constrained, it might not require any
            modifications at all. This is indeed the case. Transfer from either of the mod-
            ified counting tasks to the standard task is even higher than in the opposite
            direction. In one case, switching from targeted to standard counting, no rule
            revisions are required, so transfer is 100% in that case.
               Finally, moving from a task that is constrained one way to a different task
            that is constrained in some other way should be the hardest transfer task of
            all. This is indeed the case, as Table 7.5 shows. The transfer from ordered to
            targeted counting is only 8%. If the rules have been constrained in a differ-
            ent way than required, then the model needs to back up further in the rule
            genealogies and there will be much work required to re-specialize the rules
            and hence a small transfer effect. In general, the constraint-based specializa-
            tion theory predicts that the magnitude of a transfer effect is a function of the
            number of learning events needed to revise the prior (intermediate) rules to
            fit the transfer task. The theory predicts large transfer effects when a strat-
            egy only needs to be further specialized, lesser effects when it needs to be
            re-specialized.
               The  fact  that  the  constraint-based  specialization  mechanism  predicts
            asymmetric transfer effects is noteworthy because it differentiates this hypoth-
            esis from the identical rules hypothesis. Recall that the latter claims that rules
            are re-used when the identical rule appears in the strategy for a training task
            and the strategy for a transfer task. This formulation predicts that the magni-
            tude of the transfer effect when moving from task X to task Y is directly pro-
            portional to the overlap between the two rule sets; that is, to the number of
            rules needed to perform X that are also required to perform Y. This measure is
            necessarily symmetrical, so the identical rules hypothesis implausibly predicts
            that the amount of transfer from task X to task Y is always and necessarily
            equal to the amount of transfer from task Y to task X. This strong prediction is
            falsified by empirical studies by Miriam Bassok and others.  The constraint-
                                                              50
            based transfer mechanism claims that symmetrical transfer effects are acci-
            dents and that the principled regularity is that transfer effects are proportional
            to the number of learning events (not the number of knowledge elements)
            that are required to adapt the prior rules to the transfer task. Hence, effects are
            large when the transfer task is consistent with but more constrained than the
            training task; because “more constrained than” is asymmetric, so are trans-
            fer effects. The empirical data in the literature are not sufficient to evaluate
   255   256   257   258   259   260   261   262   263   264   265