Xuesong Niu, Hu Han, Shiguang Shan, Xilin Chen
Facial action units (AUs) recognition is essential for emotion analysis and has been widely applied in mental state analysis. Existing work on AU recognition usually requires big face dataset with accurate AU labels. However, manual AU annotation requires expertise and can be time-consuming. In this work, we propose a semi-supervised approach for AU recognition utilizing a large number of web face images without AU labels and a small face dataset with AU labels inspired by the co-training methods. Unlike traditional co-training methods that require provided multi-view features and model re-training, we propose a novel co-training method, namely multi-label co-regularization, for semi-supervised facial AU recognition. Two deep neural networks are used to generate multi-view features for both labeled and unlabeled face images, and a multi-view loss is designed to enforce the generated features from the two views to be conditionally independent representations. In order to obtain consistent predictions from the two views, we further design a multi-label co-regularization loss aiming to minimize the distance between the predicted AU probability distributions of the two views. In addition, prior knowledge of the relationship between individual AUs is embedded through a graph convolutional network (GCN) for exploiting useful information from the big unlabeled dataset. Experiments on several benchmarks show that the proposed approach can effectively leverage large datasets of unlabeled face images to improve the AU recognition robustness and outperform the state-of-the-art semi-supervised AU recognition methods.