Summary and Contributions: This paper studies the RL problem with retrospective knowledge (i.e., the influence of past events on the present state). The authors propose Reverse GVFs to represent such knowledge. The authors extend some RL algorithms to the Reverse RL setting, including Reverse TD, Distributional Reverse TD, and Off-policy Reverse TD. The authors theoretically prove the convergence of these algorithms under linear function approximation. The paper empirically demonstrates the utility of Reverse GVFs in both anomaly detection and representation learning.
Strengths: 1) This paper focuses on an interesting and practical case of reinforcement learning. Clear examples are provided to demonstrate the difference between predictive knowledge (general RL) and retrospective knowledge (this work), how RL with retrospective knowledge can be used in real-world applications, and why general RL algorithms (GVFs) fail to represent such knowledge. 2) The proposed formulation is reasonable and easy to follow. The formulation is general so that multiple existing RL algorithms can be extended to the Reverse RL setting. Theoretical analysis is given to justify the convergence of Reverse RL algorithms (under linear function approximation). 3) This paper about reinforcement learning as well as its applications is relevant to the NeurIPS community.
Weaknesses: No obvious weakness is noticed.
Correctness: I have some concerns about the empirical evaluation: in Figure 3 (b,c), it’s clearly to observe that the the anomaly is detected after 10^4 steps, since the likelihood becomes high (close to 1). However, is it true that such likelihood should be as be as low as possible if there’s no anomaly? Before 10^4 steps (no anomaly), the likelihood is already higher than 0.5.
Clarity: The paper is well written generally. In the experiment evaluation, it would be clearer to show the meaning of legends in Fig. 3(b).
Relation to Prior Work: In terms of reinforcement learning, related work is clearly discussed. However, in terms of the applied domains, e.g., anomaly detection, it would be better to describe related work, or provide some discussions, relevant to RL.
Summary and Contributions: The paper presents Reverse General Value Function to represents the influence of possible past events on the present observation (aka, retrospective knowledge). The proposed solution builds on top the Bellman operator (Yu et al., 2018) and is empirically tested both in anomaly detection and representation learning.
Strengths: - The representation learning experiments show that IMPALA+ReverseGVF is beneficial on some games. - Results suggest that ReverseGVF can help in the detection of synthetic generated anomalies. - I like the recurring example, it really helped me to follow the exposition.
Weaknesses: - The exposition of the Anomaly detection experiment can be improved. As far as I understand a simple anomaly is simulated after 10^4 steps. That event triggers an higher estimation for prob(anomaly) from the model. However the probability before that event is really high as well, higher than 0.5. It is not clear to me why that's the case. - Although the authors argue against a comparison with IMPALA+GVF, I think that would strengthen the paper. Moreover, I would have appreciated a deeper analysis/discussion on why for some games ReverseGVF hurts performance and for others improvements are observed. Are there any characteristics in the games that are negative for reverseGVF?
Correctness: some comments in the above section
Clarity: There are some parts that might benefit further clarification. For instance, I would appreciate a sentence that clarify if Assumption 1 is reasonable in a real-word scenario. Also, in Figure 4 I would specify the metric for which you see an improvement.
Relation to Prior Work: yes
Summary and Contributions: This paper proposes a concept of retrospective knowledge which requires modeling influence of possible past events on the present, and studies how to represent retrospective knowledge using reinforcement learning methods. Reverse general value functions are proposed to represent retrospective knowledge. Experiments are conducted to validate proposed methods on anomaly detection and representation learning.
Strengths: The idea of retrospective knowledge is interesting and will be useful for other scenarios besides anomaly detection. The problem formation and theoretical analysis are in a way that I think reasonable and elegant. Multiple experiments are conducted to explore the benefit of using reverse GVF for representation learning. The reinforcement learning community may benefit from the proposed method to improve representation learning.
Weaknesses: The experiments conducted in this paper used only using synthetic data. It will be more convincing to use real-world data, e.g automatic driving vehicle status anomaly detection. The results of the experiment using non-linear function approximation in figure 3c are not clearly explained. It seems that even before 10^4 steps, the probability of anomaly is high (more than 0.8). Curves in that figure are not readable enough to show the impacts of different setups.
Correctness: The methodology is sound.
Clarity: Most of the paper is clear. Lacks of analysis for Non-linear Function approximation experiments.
Relation to Prior Work: This paper builds well on related work.
Additional Feedback: Any other potential applicable scenarios besides anomaly detection?