%PDF-1.3 1 0 obj << /Kids [ 4 0 R 5 0 R 6 0 R 7 0 R 8 0 R 9 0 R 10 0 R 11 0 R 12 0 R 13 0 R 14 0 R ] /Type /Pages /Count 11 >> endobj 2 0 obj << /Subject (Neural Information Processing Systems http\072\057\057nips\056cc\057) /Publisher (Curran Associates\054 Inc\056) /Language (en\055US) /Created (2017) /EventType (Poster) /Description-Abstract (It is well known that weight initialization in deep networks can have a dramatic impact on learning speed\056 For example\054 ensuring the mean squared singular value of a network\047s input\055output Jacobian is O\0501\051 is essential for avoiding exponentially vanishing or exploding gradients\056 Moreover\054 in deep linear networks\054 ensuring that all singular values of the Jacobian are concentrated near 1 can yield a dramatic additional speed\055up in learning\073 this is a property known as dynamical isometry\056 However\054 it is unclear how to achieve dynamical isometry in nonlinear deep networks\056 We address this question by employing powerful tools from free probability theory to analytically compute the \173\134it entire\175 singular value distribution of a deep network\047s input\055output Jacobian\056 We explore the dependence of the singular value distribution on the depth of the network\054 the weight initialization\054 and the choice of nonlinearity\056 Intriguingly\054 we find that ReLU networks are incapable of dynamical isometry\056 On the other hand\054 sigmoidal networks can achieve isometry\054 but only with orthogonal weight initialization\056 Moreover\054 we demonstrate empirically that deep nonlinear networks achieving dynamical isometry learn orders of magnitude faster than networks that do not\056 Indeed\054 we show that properly\055initialized deep sigmoidal networks consistently outperform deep ReLU networks\056 Overall\054 our analysis reveals that controlling the entire distribution of Jacobian singular values is an important design consideration in deep learning\056) /Producer (PyPDF2) /Title (Resurrecting the sigmoid in deep learning through dynamical isometry\072 theory and practice) /Date (2017) /ModDate (D\07220180212221444\05508\04700\047) /Published (2017) /Type (Conference Proceedings) /firstpage (4785) /Book (Advances in Neural Information Processing Systems 30) /Description (Paper accepted and presented at the Neural Information Processing Systems Conference \050http\072\057\057nips\056cc\057\051) /Editors (I\056 Guyon and U\056V\056 Luxburg and S\056 Bengio and H\056 Wallach and R\056 Fergus and S\056 Vishwanathan and R\056 Garnett) /Author (Jeffrey Pennington\054 Samuel Schoenholz\054 Surya Ganguli) /lastpage (4795) >> endobj 3 0 obj << /Type /Catalog /Pages 1 0 R >> endobj 4 0 obj << /Parent 1 0 R /Contents 15 0 R /Type /Page /Resources 16 0 R /MediaBox [ 0 0 612 792 ] >> endobj 5 0 obj << /Parent 1 0 R /Contents 47 0 R /Type /Page /Resources 48 0 R /MediaBox [ 0 0 612 792 ] >> endobj 6 0 obj << /Parent 1 0 R /Contents 107 0 R /Type /Page /Resources 108 0 R /MediaBox [ 0 0 612 792 ] >> endobj 7 0 obj << /Parent 1 0 R /Contents 279 0 R /Type /Page /Resources 280 0 R /MediaBox [ 0 0 612 792 ] >> endobj 8 0 obj << /Parent 1 0 R /Contents 397 0 R /Type /Page /Resources 398 0 R /MediaBox [ 0 0 612 792 ] >> endobj 9 0 obj << /Parent 1 0 R /Contents 412 0 R /Type /Page /Resources 413 0 R /MediaBox [ 0 0 612 792 ] >> endobj 10 0 obj << /Parent 1 0 R /Contents 524 0 R /Type /Page /Resources 525 0 R /MediaBox [ 0 0 612 792 ] >> endobj 11 0 obj << /Parent 1 0 R /Contents 598 0 R /Type /Page /Resources 599 0 R /MediaBox [ 0 0 612 792 ] >> endobj 12 0 obj << /Parent 1 0 R /Contents 710 0 R /Type /Page /Resources 711 0 R /MediaBox [ 0 0 612 792 ] >> endobj 13 0 obj << /Parent 1 0 R /Contents 846 0 R /Type /Page /Resources 847 0 R /MediaBox [ 0 0 612 792 ] >> endobj 14 0 obj << /Parent 1 0 R /Contents 848 0 R /Type /Page /Resources 849 0 R /MediaBox [ 0 0 612 792 ] >> endobj 15 0 obj << /Length 4326 /Filter /FlateDecode >> stream xZr+Loj$θ{li #%ۍ ITi~ܾsoίNzd^y$˜Uo[:?:$NNYΗwN l;}/#'by-_'p; vur}퍳qO6j{hYZ<}Ypq*f8?;w8{$"?q$0vTj]USWtDRݴOsQ-p5di,|$ߕe4ʽ4JA:g;{y4, tR?tuE7ʕݍ&N j)IЋ˞MR-(Wp,ɞ8˞۷ezהd)Wd$i۷d~]z O $ 9Uk;1vSQ?4m?}Md~2oԋWK}Mdt+IBUP[AhIz~fش;Qo_^VkԉbGB~-/4vܪV丏^TDQAϧ]Oz,t0Kթ*ũՍp1Fخ#xhڏ5zE-a r笥Su7mqFKG V:0 * G#a rpzFzKw, M# 럔ԓ0NA+VR55u /tkS J=8~=C>W؆瑣_J(f(