Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Originality: This paper takes the next step for stochastic computing with LSTMs, quantizing all operations. It thus follows the obvious path to go and applies standard stochastic computing methods to get there. The novelity comes through the experimental analysis. Quality: There are no technical flaws. However, the evaluation metrics were defined in a way that they are skewed to show extraordinary benefits by neglecting some important contributions to the overall cost. Also, relevant newly introduced hyperparameters are arbitrarily chosen instead of evaluated. Clarity: The evaluations assume a hardware architecture for the baseline as well as their own work which is not described, thereby making some parts of the comparisons impossible to undestand in-depth. The required space for this could be obtained by shortening the 2-page primer on stochastic computing to a minimum. Some parameters (e.g. l) were chosen arbitrarily. Significance: Hard to say, because the presentation of the results does not allow for a comparison to non-stochastic computing methods. In the points compared, the results seem very promising.
This paper proposes an implementation of LSTMs that is believed to be more compatible with hardware chips by replacing the continuous non-linear operations of RNNs with logical operators based on XNOR gates and stochastic computation. The proposed WIStanh seems to be doing much better jobs in terms of approximating hyperbolic tangent function compared to IStanh function (baseline). I am a bit concerned by the binarization procedure and the trick that the authors used to flow the gradients : Eq. (19). Did the authors experience saturation issue by using this trick?
Con: Sentence structure and terminology is a bit unclear in some cases [e.g. line 69, 79] Some measures of comparison are not explained fully [e.g. Xilinx FPGA slices] Figure 3 is not very well explained [e.g. what was the sampling process, how many samples?] Could indicate best result in each row of Table 1 Pros: Novel approach Clear outline of the data set used for benchmarks Well-structured sections Good background section, well explained theory and mythology The experiments and comparisons made to other systems were well reasoned Motivation of the project was clearly outlined and justified Results reported show great decrease in computational complexity while achieving a better accuracy to similar models.
This paper proposes to use stochastic logic to approximate the activation function of LSTM. Binarization of non-linear units in deep neural networks is an interesting topic that can be relevant for low-resources computing. The main contribution of the paper was the application of stochastic logic to approximate activation functions(e.g. tanh). The authors applied the technique to a variant of LSTM model on PTB. Given that the technique is not really tied to the LSTM model, it would be more interesting to evaluate more model architectures(e.g. transformers), and compare them with the models that needs the non-stochastic logic versions. How would the approach compare to things like lookup table based the transformations? (given that we are already accumulating). Given that the PTB is a small dataset, which makes the result favorable for compression, would the approach generalize to bigger dataset?(e.g. wikipedia?). Direction of improvements: Having a dedicated evaluation of stochastic logic based activation will enhance the paper and allow the technique to be applied to a broader range of applications. From a practical point of view, implementing the method on a real hw(e.g. FPGA) will make the results more convincing, as the cost might goes beyond the number of gates. Finally, a major part of the paper is about designing the FSM for the stochastic logic activation. While this could be a contribution of the paper, it might be less relevant to the NeurIPS audiences.