Reviews: Shape and Time Distortion Loss for Training Deep Time Series Forecasting Models

This paper proposes an interesting idea in order to efficiently forecast non-stationary time series at multiple times ahead. In order to achieve this, the authors introduce an objective function called Shape and Time Distorsion Loss (STDL) to train deep neural network. The paper is well written, clear, and of certain significance. In Figure 1 the authors illustrate the limitation of certain existing approaches as a way to motivate their contributions. I am not convinced with the arguments presented there. 1) The use of online regression with adaptive tuning parameters can accurately predict the target provided in Figure 1 (e.g. Recursive Least Squares with Forgetting Factors, Kalman Filter with adaptive noise and so on...) without having recourse to new approaches. 2) Those algorithms are missing in the related art section. Furthermore, the authors mention that "Experiments carried out on several synthetic and real non-stationary datasets reveal that models trained with STDL significantly outperform models trained with the MSE loss function when evaluated with shape and temporal distorsion metrics, while SDTL maintains very good performance when evaluatedwith MSE.", this is a strong statement as the authors do not compare their approach with online approaches capable of dealing with non-stationary time series. However, the approach to training deep neural networks with the shape and time distorsion loss is interesting but deserves to be compared with competing approaches that showed promised for non-stationary forecasting.

Reviewer 2

1. The distortion problem is well-known on the time-series data. This paper aims to propose a new objective function named Shape and Time Distortion Loss (STDL) to address the problem of time series forecasting for non-stationary signals and multiple future steps prediction. The proposed loss is useful for researchers to train an end-to-end forecasting model that can handle the shape and time distortion error in a convenient way. 3. The results of this work show the proposed loss significantly outperforms other baseline loss functions such as MSE and DTW. Moreover, the proposed loss function is tested in the end of this paper to point that the loss function helps smaller models to perform as well as some other larger models. 4. Moreover, this work points the complexity of the computation of this loss by their custom backward pass is in O(nm). The implementation time between standard Pytorch auto-differentiation mechanism and the custom backward pass implementation has a significant speedup with respect to the prediction length k.

Reviewer 3

It is hard to say what constitutes a good prediction, and loss functions matter more than model architecture development now that we have neural networks and auto-differentiation. All the losses we use are some proxies inspired by loss functions in ML that were developed because they were easy to optimize a model for. So adding new, intuitively appealing loss function is significant. The paper is well written and enjoyable to read. The evaluation is fairly comprehensive and looks soundly executed. For completeness, you might use STDL as an evaluation loss as well. Are you able to summarize generic conclusions about features of the time series that will be the individual losses train well vs poorly? E.g. "MSE rounds sharp level changes". It puzzles me why people consider ECG a time series prediction task. It is commonly used as a dataset but in what real world scenario would you want to predict it? How fast is the computation? The dynamic programming looks pretty complex. You only give the speedup - what is the absolute number? Is its evaluation of the loss a bottleneck in the learning? How do you set \delta s in your experiments?

Paper ID:	2368
Title:	Shape and Time Distortion Loss for Training Deep Time Series Forecasting Models

Reviewer 1

Reviewer 2

Reviewer 3