<html>
<head>

<style>
body {background-color: #eeeeee;}

h2 {
  padding-top: 100px;
}


h3 {
  padding-top: 50px;
  text-align: center;
}


div.content {
  background-color: #ffffff;
  margin:0 auto;
  max-width: 800px;
  padding-top: 100px;
  padding-right: 100px;
  padding-bottom: 100px;
  padding-left: 100px;
  text-align: center;
  font-family: sans-serif;
}

</style>
</head>

<body>

<div class="content" id="content" name="content">

<h1>Unsupervised learning of object structure and dynamics from videos<br><br>SUPPLEMENTAL VIDEOS</h1>

<h2>Video generation quality across models (Human3.6M)</h2>

<p>
Comparison of video generation quality across models. Marker on the left is green for observed frames and red for predicted frames. Columns show different examples.
</p>

<video width="722" height="auto" autoplay loop muted playsinline><source src="videos/human_model_comparison_ex0_width722.0.mp4" type="video/mp4"></video>
<br><br>
<video width="722" height="auto" autoplay loop muted playsinline><source src="videos/human_model_comparison_ex10_width722.0.mp4" type="video/mp4"></video>
<br><br>
<video width="722" height="auto" autoplay loop muted playsinline><source src="videos/human_model_comparison_ex20_width722.0.mp4" type="video/mp4"></video>

<h2>Sample diversity (Human3.6M)</h2>

<p>
Videos in the same row were conditioned on the same oberved frames.
</p>

<h3>Example 1</h3>
<video width="722" height="auto" autoplay loop muted playsinline><source src="videos/human_sample_diversity_ex0_width722.0.mp4" type="video/mp4"></video>
<br><br>
<h3>Example 2</h3>
<video width="722" height="auto" autoplay loop muted playsinline><source src="videos/human_sample_diversity_ex10_width722.0.mp4" type="video/mp4"></video>
<br><br>
<h3>Example 3</h3>
<video width="722" height="auto" autoplay loop muted playsinline><source src="videos/human_sample_diversity_ex20_width722.0.mp4" type="video/mp4"></video>
<br><br>
<h3>Example 4</h3>
<video width="722" height="auto" autoplay loop muted playsinline><source src="videos/human_sample_diversity_ex30_width722.0.mp4" type="video/mp4"></video>
<br><br>
<h3>Example 5</h3>
<video width="722" height="auto" autoplay loop muted playsinline><source src="videos/human_sample_diversity_ex40_width722.0.mp4" type="video/mp4"></video>
<br><br>
<h3>Example 6</h3>
<video width="722" height="auto" autoplay loop muted playsinline><source src="videos/human_sample_diversity_ex50_width722.0.mp4" type="video/mp4"></video>
<br><br>
<h3>Example 7</h3>
<video width="722" height="auto" autoplay loop muted playsinline><source src="videos/human_sample_diversity_ex60_width722.0.mp4" type="video/mp4"></video>
<br><br>
<h3>Example 8</h3>
<video width="722" height="auto" autoplay loop muted playsinline><source src="videos/human_sample_diversity_ex70_width722.0.mp4" type="video/mp4"></video>
<br><br>
<h3>Example 9</h3>
<video width="722" height="auto" autoplay loop muted playsinline><source src="videos/human_sample_diversity_ex80_width722.0.mp4" type="video/mp4"></video>
<br><br>
<h3>Example 10</h3>
<video width="722" height="auto" autoplay loop muted playsinline><source src="videos/human_sample_diversity_ex90_width722.0.mp4" type="video/mp4"></video>
<br><br>

<h2>Keypoint manipulation (Human3.6M)</h2>

<p>
Keypoints for each limb were manually identified based on the left-most image. Keypoints for a single limb were then manipulated by rotating them around the joint of the limb, while holding the other keypoints static. Columns shows different examples.
</p>

<video width="722" height="auto" autoplay loop muted playsinline><source src="videos/human_manipulation_width722.0.mp4" type="video/mp4"></video>
<br><br>

<h2>Video generation quality across models (Basketball)</h2>

<p>
Comparison of video generation quality across models. Marker on the left is green for observed frames and red for predicted frames. Each column shows a different example.
</p>

<video width="722" height="auto" autoplay loop muted playsinline><source src="videos/basketball_model_comparison_ex0_width722.0.mp4" type="video/mp4"></video>
<br><br>
<video width="722" height="auto" autoplay loop muted playsinline><source src="videos/basketball_model_comparison_ex10_width722.0.mp4" type="video/mp4"></video>
<br><br>

<h2>Action-conditional video generation quality (DMCS)</h2>

<p>
Video generation quality for the DeepMind Control Suite dataset. A single model was trained on data from all tasks. Columns show different examples.
</p>

<h3>Acrobot</h3>
<video width="392" height="auto" autoplay loop muted playsinline><source src="videos/dmcs_model_comparison_acrobot_width392.0.mp4" type="video/mp4"></video>
<br><br>
<h3>Cartpole</h3>
<video width="392" height="auto" autoplay loop muted playsinline><source src="videos/dmcs_model_comparison_cartpole_width392.0.mp4" type="video/mp4"></video>
<br><br>
<h3>Cheetah</h3>
<video width="392" height="auto" autoplay loop muted playsinline><source src="videos/dmcs_model_comparison_cheetah_width392.0.mp4" type="video/mp4"></video>
<br><br>
<h3>Reacher</h3>
<video width="392" height="auto" autoplay loop muted playsinline><source src="videos/dmcs_model_comparison_reacher_width392.0.mp4" type="video/mp4"></video>
<br><br>
<h3>Walker</h3>
<video width="392" height="auto" autoplay loop muted playsinline><source src="videos/dmcs_model_comparison_walker_width392.0.mp4" type="video/mp4"></video>
</div>
</body>
</html>
