Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
With both methods, they demonstrate similar performance to backpropagation on ResNet-18 and ResNet-50 architectures on ImageNet. To me, this is the biggest strength of the paper. There have been many proposals for learning algorithms that do not rely on weight transport (feedback alignment included), which have only been evaluated on toy tasks and on non-convolutional neural networks. The only weakness I would say is if there were experiments with more architectures on ImageNet using their learning algorithm, either variants of ResNet (101, 152, etc) and outside of the ResNet family (VGG, AlexNet, EfficientNets). If that is difficult to do, is it because the hyperparameters to get this algorithm to work are more than those for standard backpropagation? If that is indeed the case, that should be made explicit in the main text of the paper.
The study proposes two alternative mechanisms that do not need weight transport as used in backprop to mimic biologically plausible learning. The authors introduce the two mechanisms, mirror weight and Kollen-Pollack (KP) algorithm, in an way that is easy to understand, while providing sufficient mathematical details. The experiments show that, both methods, particularly the KP one, provide lower test error on imagenet task with smaller resnet, compared to feedback alignment, sign-symmetry and backprop, while with larger resnet feedback alignment clearly wins. Both weight mirroring and KP were found to be successful in terms of aligning the forward and backward weight matrices. Although the results are preliminary and more experiments are definitely needed to establish these alternative mechanisms, I find this a nice piece of work, which demands attention and further investigation from the community. Minor comments: On line 168, it was mentioned that sign-symmetry did better in other experiments - but they were not shown. I would encourage authors to include additional experiments in SI.
Getting rid of weight transport is an important step towards biologically plausible deep learning algorithms and both proposed mechanisms seem to be valid contributions towards this goal. While not entirely surprising, the empirical results show the effectiveness of the proposed methods. The biggest criticism to be brought forward is perhaps the fact that both methods rely on one-to-one mappings between y and delta nodes. This assumption can probably hardly be met in biological systems, and it is not obvious how this can be circumvented. It would be good if the authors could comment on the plausibility of this assumption and possibly discuss alternative scenarios (e.g. is there a way to have a linear mapping between delta and y, rather than the identity? Are there known circuits which somewhat resemble this direct correspondence?) -- The authors have addressed my concerns in the rebuttal. I'm updating my score.