Reviews: Pareto Multi-Task Learning

The paper is extending the recent (NeurIPS 2018) work, which poses multi-task learning as multi-objective optimization. Although the submission is somewhat incremental, it is significant. Finding an arbitrary point on a Pareto efficiency curve is a significant limitation, and practitioners would rather find the entire Pareto efficiency curve. The submission overcomes this limitation. Moreover, empirical results support the claim and show the significance of the method. In the meantime, the submission can still be improved significantly. I strongly recommend the following actions to the authors until the camera-ready deadline: 1 - Improve the presentation of the paper. Some of the reviewers found the paper hard to read. I agree with the concern; although, I find the required edits minor and easy to do before the camera-ready deadline. 2 - Discuss the scalability of the method in terms of both memory and computation. I think the proposed method would have a hard time scaling to a large set of tasks. For example, [12] uses multi-label classification as a multi-task problem with 40 tasks. I am curious how many regions would you need to efficiently cover the Pareto efficiency curve of a problem with 40 tasks? Additional experiments and/or a discussion would be useful.

Paper ID:	6489
Title:	Pareto Multi-Task Learning