Generalized Block-Diagonal Structure Pursuit: Learning Soft Latent Task Assignment against Negative Transfer

Yang, Zhiyong; Xu, Qianqian; Jiang, Yangbangyan; Cao, Xiaochun; Huang, Qingming

Generalized Block-Diagonal Structure Pursuit: Learning Soft Latent Task Assignment against Negative Transfer

Zhiyong Yang, Qianqian Xu, Yangbangyan Jiang, Xiaochun Cao, Qingming Huang

Advances in Neural Information Processing Systems 32 (NeurIPS 2019)

AuthorFeedback Bibtex MetaReview Metadata Paper Reviews Supplemental

Abstract

In multi-task learning, a major challenge springs from a notorious issue known as negative transfer, which refers to the phenomenon that sharing the knowledge with dissimilar and hard tasks often results in a worsened performance. To circumvent this issue, we propose a novel multi-task learning method, which simultaneously learns latent task representations and a block-diagonal Latent Task Assignment Matrix (LTAM). Different from most of the previous work, pursuing the Block-Diagonal structure of LTAM (assigning latent tasks to output tasks) alleviates negative transfer via collaboratively grouping latent tasks and output tasks such that inter-group knowledge transfer and sharing is suppressed. This goal is challenging, since 1) our notion of Block-Diagonal Property extends the traditional notion for square matrices where the $i$-th column and the $i$-th column represents the same concept; 2) marginal constraints on rows and columns are also required for avoiding isolated latent/output tasks. Facing such challenges, we propose a novel regularizer by means of an equivalent spectral condition realizing this generalized block-diagonal property. Practically, we provide a relaxation scheme which improves the flexibility of the model. With the objective function given, we then propose an alternating optimization method, which not only tells how negative transfer is alleviated in our method but also reveals an interesting connection between our method and the optimal transport problem. Finally, the method is demonstrated on a simulation dataset, three real-world benchmark datasets and further applied to personalized attribute predictions.

Abstract

Name Change Policy