Learning Disentangled Representations of Videos with Missing Data

Comas, Armand; Zhang, Chi; Feric, Zlatan; Camps, Octavia; Yu, Rose

Learning Disentangled Representations of Videos with Missing Data

Armand Comas, Chi Zhang, Zlatan Feric, Octavia Camps, Rose Yu

Advances in Neural Information Processing Systems 33 (NeurIPS 2020)

AuthorFeedback Bibtex MetaReview Paper Review Supplemental

Abstract

Missing data poses significant challenges while learning representations of video sequences. We present Disentangled Imputed Video autoEncoder (DIVE), a deep generative model that imputes and predicts future video frames in the presence of missing data. Specifically, DIVE introduces a missingness latent variable, disentangles the hidden video representations into static and dynamic appearance, pose, and missingness factors for each object, while it imputes each object trajectory where data is missing. On a moving MNIST dataset with various missing scenarios, DIVE outperforms the state of the art baselines by a substantial margin. We also present comparisons on a real-world MOTSChallenge pedestrian dataset, which demonstrates the practical value of our method in a more realistic setting. Our code can be found in https://github.com/Rose-STL-Lab/DIVE.

Abstract

Name Change Policy