We visualize the learned graph representation on videos of faces and humans. While the graph is only learned on a collection of single images, they show stability and consistency on videos. Note the subtle eye and mouth shape on faces, and fine leg motion on humans.
The learned graph representation can be used to train a pose transfer network on videos. Note that also the translation models are trained on single images instead of videos (image translation, not video translation). Nevertheless, it is stable even when the pose change is large (first row), and the subtle details, such as mouth motion (third row) and eye blinking (forth row), are also captured.
The learned graph representation can be used to train a conditional GAN. In this experiment, we trained a single detector and GAN on AFHQ, where multiple animals are trained at the same time. This demonstrates the robustness to shape variations and capability of learning a shared animal head model.