Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
This paper analyzes the propagation of the distribution of activation variance and correlation through fully-connected feedforward networks, extending a line of prior work that has focused on the mean-field (large width) limit, in which the distributions concentrate around their mean. The reviewers agree that the theoretical analysis is novel and of interest to the community. Particularly noteworthy is the explicit form for the transition kernel for the variance. Some reviewers expressed reservations about the thoroughness of the experimental analysis, but as the main contributions of this work are theoretical in nature, I believe the analysis is sufficient. In the final version, the authors should discuss connections to the looks-linear initialization from Balduzzi et al.