Sepp Hochreiter, Jürgen Schmidhuber
Standard recurrent nets cannot deal with long minimal time lags between relevant signals. Several recent NIPS papers propose alter(cid:173) native methods. We first show: problems used to promote various previous algorithms can be solved more quickly by random weight guessing than by the proposed algorithms. We then use LSTM, our own recent algorithm, to solve a hard problem that can neither be quickly solved by random search nor by any other recurrent net algorithm we are aware of.
1 TRIVIAL PREVIOUS LONG TIME LAG PROBLEMS
Traditional recurrent nets fail in case 'of long minimal time lags between input sig(cid:173) nals and corresponding error signals [7, 3]. Many recent papers propose alternative methods, e.g., [16, 12, 1,5,9]. For instance, Bengio et ale investigate methods such as simulated annealing, multi-grid random search, time-weighted pseudo-Newton optimization, and discrete error propagation . They also propose. an EM ap(cid:173) proach . Quite a few papers use variants of the "2-sequence problem" (and ('latch problem" ) to show the proposed algorithm's superiority, e.g. [3, 1, 5, 9]. Some pa(cid:173) pers also use the "parity problem", e.g., [3, 1]. Some of Tomita's  grammars are also often used as benchmark problems for recurrent nets [2, 19, 14, 11].
Trivial versus non-trivial tasks. By our definition, a "trivial" task is one that can be solved quickly by random search (RS) in weight space. RS works as follows: REPEAT randomly initialize the weights and test the resulting net on a training set UNTIL solution found.