NeurIPS 2020

Multi-agent active perception with prediction rewards

Meta Review

This paper addresses the problem of multiagent active perception, a somewhat nascent area, and proposes a new reformulation of Dec-rho-POMDPs into a DEC-POMDP though the addition of a final-stage “predictive action.” The reviewers appreciated the novelty of this contribution as well as the theoretical analysis/loss bounds. The original reviews raised a number of questions however, and the author response addressed many of these. However, there remain some issues that undercut the significance of the contribution, including: the somewhat incremental combination/adaptation of existing techniques; the fact that the claimed scalability is not demonstrated very convincingly in the experiments; among others. On my reading of the paper, I largely concur and do not reiterate the positive contributions in the other reviews, but point out some concerns about importance/impact: 1. The modeling seems somewhat incremental. The authors are upfront about this, stating that they aim to apply the action prediction “trick” of POMDP-IR for \rhoPOMDPs to Dec-\rho-POMDPs. That doesn’t diminish the contribution, but is simply a claim about its novelty. 2. The theoretical results seem correct and marginally interesting, but don’t necessarily seem that impactful. The bounds in Thm.1 seem quite loose, though the idea of loss from decentralization is a nice one, so conceptually, the contribution is good. Cor.1 strikes me as less interesting---unless I’m misinterpreting it, the condition means that no hidden state correlates observations among the agents, hence each agent has a completely independent info-gathering task. If that’s the case, I would argue it's not really a decentralized information task. If this interpretation is wrong, the author(s) should make clear why this result is interesting. 3. When alpha vectors are introduced (effectively) as way to make the prediction task discrete in Sec. 3.1 (Defn 2), my initial reaction was that this is a major *approximation* to a general convex reward function (e.g., negative entropy) that should not be made without comment, or without providing error bounds. Fortunately (and finally) some discussion of this was offered in Sec.5 along with an adaptive algorithm for selecting these. But this seems like an overlooked opportunity for impactful analysis (e.g., error introduced by this discretization, either fixed or adaptive). 4. While this is a nascent area, the test problems (even if drawn from Lauri et al 2019) still seem very toy-like (2 agents, very short horizons, very small set of predictive actions). This is perhaps OK for this area at this point in its development, but more impressive empirical results would overcome some of the concerns on the novelty and depth/impact of the theoretical analysis.