Learning Motion Refinement for Unsupervised Face Animation

Part of Advances in Neural Information Processing Systems 36 (NeurIPS 2023) Main Conference Track

Bibtex Paper Supplemental


Jiale Tao, Shuhang Gu, Wen Li, Lixin Duan


Unsupervised face animation aims to generate a human face video based on theappearance of a source image, mimicking the motion from a driving video. Existingmethods typically adopted a prior-based motion model (e.g., the local affine motionmodel or the local thin-plate-spline motion model). While it is able to capturethe coarse facial motion, artifacts can often be observed around the tiny motionin local areas (e.g., lips and eyes), due to the limited ability of these methodsto model the finer facial motions. In this work, we design a new unsupervisedface animation approach to learn simultaneously the coarse and finer motions. Inparticular, while exploiting the local affine motion model to learn the global coarsefacial motion, we design a novel motion refinement module to compensate forthe local affine motion model for modeling finer face motions in local areas. Themotion refinement is learned from the dense correlation between the source anddriving images. Specifically, we first construct a structure correlation volume basedon the keypoint features of the source and driving images. Then, we train a modelto generate the tiny facial motions iteratively from low to high resolution. Thelearned motion refinements are combined with the coarse motion to generate thenew image. Extensive experiments on widely used benchmarks demonstrate thatour method achieves the best results among state-of-the-art baselines.