Review for NeurIPS paper: POMDPs in Continuous Time and Discrete Spaces

NeurIPS 2020

POMDPs in Continuous Time and Discrete Spaces

Review 1

Summary and Contributions: The manuscript presents an optimal control approach for a partially observable Markov decision process (POMDP) in continuous time with discrete state-spaces. The contribution is the formulation of a continous-time POMDP model and an algorithm to solve the Hamilton-Jacobi-Bellman equation using a neural network approximation to the values function

Strengths: The considered problem is - as far as I know - novel. The theoretical derivation are sound

Weaknesses: Post-rebuttal: I really appreciate the thoughtful response by the authors. The main issue for me was the motivation of this work and I would like to see good reasons that this new modelling paradigm is not only considered out of curiosity. I understand now, that it is an interesting problem to study but I am still unsure about the practical relevance since the control itself will always be implemented in some sort of (digital) time-discrete fashion. ======================================== I am not sure how relevant the particular problem formulation might be for the community, as a time discretization seems to be more convenient in many cases. The experimental evaluation is weak. In particular the gridworld problem is an example, which would have been much easier to solve in the continuous state space domain and it cannot be expected to scale well for larger dimensions.

Correctness: Yes, I couldn't find any flaws.

Clarity: Yes, I think for the available space, the authors have done their best to clearly formulate their contribution. However this topic requires background knowledge from different areas and is therefore not easy to follow.

Relation to Prior Work: The related work is properly reviewed. However, I would like to the in the experiment a comparison to alternative methods (even if they operate in discrete time or continuous state spaces)

Reproducibility: Yes

Additional Feedback:

Review 2

Summary and Contributions: This paper makes two contributions: (i) a theoretical contribution in which they extend the theory of POMDPs to continuous-time and finite discrete-state processes and (ii) algorithms for solving the analogue of the Bellman equations (which are called Hamilton-Jacobi-Bellman) together with a discussion of experimental results on toy examples. The theoretical contribution builds on work from the theory of stochastic filtering as well as standard material from RL. The algorithm is guided by ideas from the more traditional setting but essentially they have to solve a PDE with time derivatives so it is not entirely a routine matter to extend standard algorithms.

Strengths: The main strength is that introduces a whole new paradigm and develops the mathematical framework in a principled way. There is a lot of new machinery to digest but they do a good job in explaining the setup. There is too much to be really explained properly in detail but they do well with the limited space that they have.

Weaknesses: I think that the algorithmic work is very preliminary but it is fair enough for the first paper in this area. There is a vast amount of new material that comes with continuous time; they did a good job, but many parts of the framework could not be explained in the space they have. At some point they seem to veer towards continuous space and then shy away from it. I am a little puzzled that they keep writing the phrase "finite countable set". A finite set is *automatically* countable, a countable set may not be finite. More usually one sees the phrase "countably infinite" set but "countable finite" sounds silly and makes me have passing doubts about their familiarity with the subject. However, they have explained the harder mathematical ideas quite well.

Correctness: Yes, the theory seems correct as far as I could check. The empirical results are good but very preliminary.

Clarity: Yes and no. The material is promising and the explanations are as good as one expect in paper of this size. However, a serious reader would have to learn quite a lot of things in order to understand the points. The continuous-time world is very different from the 'real world"

Relation to Prior Work: Yes.

Reproducibility: Yes

Additional Feedback: Please explain what are the obstacles to having continuous time and space. The framework of HJB and trajectories will all work. One will have to introduce Feller-Dynkin semigroups or something like SDEs. ======================================= Post-rebuttal. I thank you for your well thought out rebuttal. I liked the paper before and I am still in favour of acceptance.

Review 3

Summary and Contributions: ***UPDATE: I have read the response. The core motivations are still unclear, and it is still unclear why discretization in time is a problem - I don't think you need pseudo-observations, I think perhaps you need to marginalize out [or leave as unobserved] time-bins without data. But I am not an expert. *** This paper presents a new model for decision making under uncertainty that involves discrete states / actions, but continuous time. The bulk of the paper is dedicated to a complex technical exposition that outlines the model, defines a notion of value function, and shows the existence of a Bellman-like recursion. Two approximate algorithms for learning the value function are given. Some small experiments illustrate them functioning in small domains.

Strengths: Overall, this is an interesting paper. It is technically sophisticated, and presents an interesting new model class. A few foundational quantities are derived that could be useful for future work. The learning algorithms seem straightforward. The experiments seemed reasonable, and illustrated the working of the algorithms.

Weaknesses: While I felt that this model class was interesting, I did not find it particularly well motivated. This is a complex model; why exactly do we need it? What advantages does it really have over some sort of discretization in time? In a related vein, it wasn't clear at all how the learning algorithms really worked. I am not an expert in this field, and I had a hard time following all of the proofs, so it wasn't clear to me exactly how these algorithms were approximating various intractable quantities. A few more high-level pointers would have been very helpful.

Correctness: Given the technical complexity of the work and my own lack of expertise, I cannot judge this well, although what I did understand seemed correct.

Clarity: It is generally well written, although complex.

Relation to Prior Work: Yes.

Reproducibility: No

Additional Feedback:

Review 4

Summary and Contributions: Post-rebuttal: I would like to thank the authors for the thoughtful response. The main issue for me was clarity, and I'm happy that the authors agreed to improve this aspect of the paper. However, it's hard to increase my score based on this promise alone. Nevertheless, my recommendation should really be considered a borderline recommendation. I will not fight against accepting this paper. ======================================== This paper addresses planning in settings with partial observability and a discrete state space, but with continuous time. This involves both filtering and control. To the best of my understanding, the contributions of this paper are: 1. Defining a model which captures such problems (a POMDP with continuous time), 2. Developing the Hamilton-Jacobi-Bellman (HJB) equation for the optimal value function in such a model, and 3. Approximating the optimal solution using both an offline and an online approach.

Strengths: The problem addressed is interesting and relevant. The proposed approach works well on the empirical evaluation.

Weaknesses: I found the paper hard to follow, and it wasn't even clear to me at first where the contributions of this paper start. The empirical evaluation is limited to very small toy problems. There is some related work using automata models to describe very similar problems, e.g.: * Lars Blackmore, Stanislav Funiak, Brian C. Williams, "Combining Stochastic and Greedy Search in Hybrid Estimation", AAAI 2005: 282-287 * F. Zhao, X. Koutsoukos, H. Haussecker, J. Reich, and P. Cheung, “Distributed monitoring of hybrid systems: A model-directed approach,” IJCAI 2001, pp. * S. Narasimhan and G. Biswas, “An approach to model-based diagnosis of hybrid systems”, in HSCC 2002, pp. 308–322

Correctness: As far as I could tell

Clarity: I found it hard to follow

Relation to Prior Work: Some related work is missing

Reproducibility: Yes

Additional Feedback: