Paper ID: | 6218 |
---|---|

Title: | DTWNet: a Dynamic Time Warping Network |

The method is novel, the work is well done and presents high-quality research. The presentation is excellent, the method is explained in a clear manner and with sufficient details to reproduce the results (although not needed as the authors provide the code). The presented work is significant, with both academic and industrial impact.

There are many papers that have tried to combine DTW and neural networks, but the alternating optimization approach in this paper appears to be new and significant to me. While the idea of the paper is exciting, the execution is poor and the writing is disappointing. Here are the main feedback points: 1. The writing omits the description of many notations. For example, I could not find the definition of the forward pass for the main network G_{x, w}. I could not figure out how they go from x to z. Another example is in line 4 of Algorithm 1. It refers to f_t(x, y) in Eq. (2), but it is not found there. I did search the provided code for answers to the above questions, but didn't spend much time reading because it lacked a proper documentation. 2. It is possible to prove some form of convergence results for alternating optimization algorithms. However, the theoretical analysis in this paper is not rigorous. In particular, Theorem 1 makes unacceptable assumptions. For example, consider "Assume that x ̃ is close to global minima": This assumption may never hold. We cannot put assumptions on the solutions, we only can put assumptions on the initial conditions and the learning algorithms and then show that the solution will be in the \epsilon neighborhood of the final solutions after t iterations. 3. It seems that the idea only will work for the univariate time series. Can the authors comment on this? 4. The experiments are insufficient. The authors need to evaluate their proposed algorithm on more and diverse datasets and report the speed results. One dataset from UCR repo is not enough. 5. The authors have some handwaving claims such as interpretability in the abstract without fully supporting them in the experiments. === After reading the response and discussions === I appreciate the authors response. I keep my score. The authors argument about non-linearity of neural network and thus having low-quality theorems is not acceptable. These days there are plenty of high-quality theorems for deep learning. The authors argument about interpretability is also invalid. Overall, I don't think this paper is ready for publication. === Update 2 After discussion with Reviewer #1, I updated my vote. The authors need to seriously improve the presentation of this paper.

This paper presents a learning framework based on Dynamic Time Warping. DTW kernel is applied in neural networks with stochastic backpropagation. Both end-ot-end DTW and streaming version are implemented in this work. The method has been tested on both synthetic and real data. This work applies learnable DTW kernels in neural networks to represent Doppler invariance in the data. In instead of using DTW as a loss function, DTW kernel is employed obtain a better feature extraction and generates interpretable representation. DTW loss and convergence is theoretically analysed. The proposed technique can be further improved. For example, it will be helpful to know how to decide the number of DTW layers. In algorithm 1’s INPUT part, kernels are initially set as input. However the kernels are randomly initialised in OUTPUT part, which should be clarified. Also, in the experiment, it will be helpful if the proposed method can be tested with more real datasets for the application part.