Summary and Contributions: The authors propose a reinforcement learning approach to the target coverage problems in directional sensor network. The authors introduce a hierarchical multi-agent RL algorithm. The lower level of the hierarchy is the “executor” which is a policy network trained using standard RL while the higher level is a coordination mechanism relying on attention to identify the contribution of the different agent and assign goals for the lower level layer. The authors provide empirical results showing the advantage of their method against state of the art MARL algorithms as well as optimization techniques specific to the target coverage problem.
Strengths: Although the paper is tailored to the target coverage problem, I believe the coordination mechanism could be applied to the larger class of cooperative multi-agent problems. I believe using attention mechanism in MARL is not novel but the framing of the higher level hierarchy estimating the marginal contribution of each agent is original and interesting. The experiments show clearly the advantage of the approach and the choice of the baselines is sound. The author propose an ablation study to identify the benefit of the different components they introduce.
Weaknesses: The paper is targeted toward the specific problem of target coverage. Even if a lot of components could generalize, it is hard to asses from the experiments if that would be the case since only one domain is considered. Even though it is discussed, it would have been interesting to show quantitative results on the impact of the two step training (executor and then coordinator) vs an end to end method. Could you provide some insights? The authors provide a well crafted reward function for the problem of interest, it would be interesting to study the influence of this design on the overall performance. How much would that affect the results? How does the scalability of this method compare with other algorithm in terms of number of targets and number agents? Would the centralized coordinator scale to > 20 agents? This claim: “the critic based on such global feature cannot work well” needs further justification.
Correctness: The methods are correct and the experiments seem to be carried correctly (repeated trials, various baselines, report relevant metrics), except for the training episodes: In the appendix, the author show that they use less training episodes for the baselines than for their method, why? Is that still a fair assessment?
Clarity: The description of the coordinator could be improved. In particular: “ There has been ID in the pairwise observation to distinguish them from each other. Thus, their order becomes not quite meaningful.” How are the agents and target ID? Why wouldn’t the order matter here? The filtering component is barely explained. What is its role? Why is it needed? And how is it implemented? Formatting: L 212: evaluation Labels on figure 2 are not readable
Relation to Prior Work: The authors cite the relevant body of literature and the relation is pretty clear.
Additional Feedback: The filtering part needs more details if one wanted to reimplement the algorithm. The author provide code for both the environment and the algorithm. I have read the authors' response and wish to keep my score.
Summary and Contributions: The authors are considering a directional sensor network with a number of randomly moving targets. A number of sensors are deployed in the field in fixed locations and are tasked with maximizing the number of targets they capture. The sensors can only capture information in a certain angle. However, it is assumed that the sensors can rotate (change the direction of capture). The authors are formulating this problem as one of multi-agent reinforcement learning. Inspired from the model of Hierarchical Reinforcement Learning, they propose an approach called Hierarchical Multi-agent Coordination Mechanism. The hierarchy consists of a centralized coordinator and the sensors. One of the challenges the authors address in their model is that the large number of candidate combinations for target selection for the individual sensor nodes, which they address with an attention based representation.
Strengths: Overall, the authors show a good understanding of the multi-agent reinforcement learning model and applied it to a concrete application area.
Weaknesses: - It is not clear that the application area calls for a multi-agent model here. If there is a central observer that can observe everything, and the interests of the nodes are exactly aligned, and you have a real time communication: this is essentially the case of centralized control. - The authors essentially only compare with other MARL approaches - and do not give a serious thought of what other alternatives exist. For instance, for the problem sizes the authors discuss (4 sensors and 5 targets) the optimal direction of the sensors can be found trivially through search. The authors also do not discuss whether their approach outperforms trivial distributed heuristics (for instance: each agent should make sure that it tracks the target closest to it, and within this constraint, try to track other targets as well as possible). - It is not clear that the proposed approach would generalize to any other application.
Correctness: As far as I can tell, the claims, method and evaluation approach are correct.
Clarity: Overall, the paper is well written.
Relation to Prior Work: I am not aware of prior work tackling this particular challenge.
Additional Feedback: I have read the authors feedback. I appreciate the authors running the simple heuristics I proposed - hopefully the authors agree with me that the fact that a non-learning heuristic outperforming all the other approaches except the one proposed, raise some questions about the structure of the problem. I don't find convincing the answer about the need for MARL - that the coordinator can only communicate every 10 "steps" - where is 10 coming from? Also, how much is a "step" - is is 1ms, 1s, 1min??? Is this for sports videos or for airplane tracking?
Summary and Contributions: 1) This paper proposes a Hierarchical Multi-agent Coordination Mechanism for solving the target coverage problem in DSNs. 2) Attention mechanism and approximating marginal contribution (AMC) is utilized to improve the performance. 3) Experiments in simulators illustrate the effectiveness of the proposed method.
Strengths: -Overall, the paper is well written and clear. -The paper presents a hierarchical multi-agent coordination mechanism for the task of enhancing the target coverage in DSNs. -The design choice seems reasonable and its intuition is well motivated. -Ablation studies show the effectiveness of the proposed components. -Details & codes are provided to improve the reproducibility.
Weaknesses: 1) The task of enhancing the target coverage in Directional Sensor Networks (DSNs) is important and challenging. However, as far as I am concerned, it is not a standard benchmark environment for studying multi-agent reinforcement learning. The proposed method/model design targets at a specific problem, limiting its significance. There already exist some popular environments for multi-agent cooperation. If experiments are conducted on these standard benchmarks, the significance of this work for the machine learning (ML) or reinforcement learning (RL) community can be improved. 2) The design choice is reasonable, however, the technical novelty seems a little bit low. Attention-based encoders have been widely used in literature to solve the problem with the variable length. And modeling the marginal contribution seems straightforward. 3) As shown in Table.1, the average gain of the proposed method is not as good as COMA or ILP. Are there any comments?
Correctness: The description of the method is mostly clear. The design choices of the components are well-validated.
Clarity: The submission is clearly written and well organized.
Relation to Prior Work: The related work section is mostly clear and explain the difference with existing approaches.
Additional Feedback: Question1: It seems that the authors will release the codes/environments to the public. Can the authors confirm this? ---------------------------------------------------------------------------------- Update: After reading other reviewers' comments and the author responses, I share the reviewers' concerns about the limited application areas, and experimental settings. Regarding my own concerns, the author responses did not really address my concerns about limited novelty and lack of benchmark experiments. The paper targets at solving a specific problem of multiple target coverage using hierarchical multi-agent algorithm. However, since it is not a standard benchmark, the effectiveness of the algorithm is still concerning. I would like to keep my original score.