AutoLink: Self-supervised Learning of Human Skeletons and Object Outlines by Linking Keypoints
Paper ID: 1177
Please open this webpage in one of the following browsers, which support embedded videos:

We visualize the learned graph representation on videos of faces and humans. While the graph is only learned on a collection of single images, they show stability and consistency on videos. Note the subtle eye and mouth shape on faces, and fine leg motion on humans.

Application: Pose Transfer

The learned graph representation can be used to train a pose transfer network on videos. Note that also the translation models are trained on single images instead of videos (image translation, not video translation). Nevertheless, it is stable even when the pose change is large (first row), and the subtle details, such as mouth motion (third row) and eye blinking (forth row), are also captured.

Application: Conditional GAN

The learned graph representation can be used to train a conditional GAN. In this experiment, we trained a single detector and GAN on AFHQ, where multiple animals are trained at the same time. This demonstrates the robustness to shape variations and capability of learning a shared animal head model.