<!DOCTYPE html>
<html>

<head>
    <title>Canonical Capsules:Self-Supervised Capsules in Canonical Pose</title>
</head>
<style type="text/css">
    body {
        margin: 0;
        padding: 0;
        text-align: center;
    }

    div.container {
        width: 60%;
        margin: 0 auto;
        text-align: justify;
    }

    .teaser_vid {
        /* border: 2px solid gray; */
        width: 60%;
        margin: 0 auto;
        text-align: center;
    }
    .center {
        display: block;
        margin-left: auto;
        margin-right: auto;
        width: 90%;
    }


    #title {
        font-family: 'Times New Roman', serif;
        font-size: 20px;
        line-height: 14px;
        text-transform: uppercase;
        letter-spacing: 2px;
        font-weight: bold;
        color: #444;
    }
    #list{
        font-family: Times New Roman, times, times-roman, georgia, serif;
        font-size: 20px;
        line-height: 30px;
        margin: 0 auto;
        /* width: 70%; */
        text-align: left;
        color: #444;
        /* background-color: #7e7b7b34; */
    }

    #authors{
        font-family: Fanwood Text;
        font-size: 20px;
        line-height: 14px;
        /* text-transform: uppercase; */
        /* letter-spacing: 1px; */
        font-weight: bold;
        /* color: #444; */
    }
    #material{
	    margin-top: 10px;
	    font-family: 'Times New Roman', serif;
	    font-size: 20px;
	    text-align: center;
        font-weight: bold;
	    margin-left: auto;
	    margin-right: auto;
    }
    #bar{
        height: 1px; 
        background-color: #0c0c0c52;
    }

    div.container p {
        font-family: Times New Roman, times, times-roman, georgia, serif;
        font-size: 20px;
        line-height: 30px;
        margin: 0 auto;
        /* width: 70%; */
        text-align: justify;
        color: #444;
        /* background-color: #7e7b7b34; */
    }

    div.container h2 {
        font-family: times, Times New Roman, times-roman, georgia, serif;
        font-size: 30px;
        line-height: 40px;
        letter-spacing: 0px;
        color: #444;
    }

    div.container h3 {
        font-family: times, Times New Roman, times-roman, georgia, serif;
        font-size: 25px;
        line-height: 20px;
        letter-spacing: -1px;
        color: #444;
        /* background-color: #7e7b7b34; */
    }
</style>

<body>

    <div class="container">
        <br>
        <br>
        <div id="bar"></div>
        <H2 style="text-align: center;"> Canonical Capsules:Self-Supervised Capsules in Canonical Pose </H2>
        <div id="title" style="text-align: center;"> Paper ID: 1370</div>
        <br>
        <div id="bar"></div>
        <br>
        <H3 style="text-align: center;"> Overview of Supplementary Material</H3>
        <div id="list">
        In this supplementary material, we provide:
            <ol>
            <li><b> The teaser video</b> which briefly summarizes our framework.
            </li>
            <li><b>The supplementary appendix</b> which provides architectural details and additional results.
            </li>
            <li><b>Qualitative results of canonicalization</b> which show the stability of canonicalization. 
            </li>
            <li><b>Qualitative results of reconstruction and decomposition</b> in unaligned setup. 
            </li>
            </ol> 
        </div>
        <br>
        <div id="bar"></div>
        <br>
        <H3 style="text-align: center;"> Teaser Video</H3>

        <p>
            <b>TL;DR:</b> A self-supervised capsule architecture that canonicalizes data 
            while simultaneously decomposing point clouds into parts to perform unsupervised representation learning.
        </p><br>
        <p>
            <b>Abstract:</b>
            We propose an unsupervised capsule architecture for 3D point clouds. 
            We compute capsule decompositions of objects through permutation-equivariant attention, and self-supervise the process by training with pairs of randomly rotated objects. 
            Our key idea is to aggregate the attention masks into semantic keypoints, 
            and use these to supervise a decomposition that satisfies the capsule invariance/equivariance properties. 
            This not only enables the training of a semantically consistent decomposition, but also allows us to learn a canonicalization operation that enables object-centric reasoning. 
            To train our neural network we require neither classification labels nor manually-aligned training datasets. 
            Yet, by learning an object-centric representation in a self-supervised manner, our method outperforms the state-of-the-art on 3D point cloud reconstruction, canonicalization, and unsupervised classification.
        </p>
        <br><br>
        <!-- <img src="videos/teaser.mp4" width=90% class="center"> -->
        <video width="100%" loop autoplay muted>
            <source src="videos/teaser.mp4" type="video/mp4">
        </video>
        <br>
        <div id="bar"></div>
        <!-- <div id="title">Qualitative Highlights:</div> -->
        <H3 style="text-align: center;"> Supplementary Appendix</H3>
        <p>
        Architectural details, additional ablation studies, and qualitative results for the aligned setup are available in supplementary appendix.
        For more details, <b>Click</b> the image below to access the <a href="pdf/supplementary_appendix.pdf">PDF</a> </br></br>
        </p>
        <div style="text-align: center;">
            <a href="pdf/supplementary_appendix.pdf" style="text-align: center;">
                <img src="img/icon.png" alt="Supplementary Appendix PDF" width="70%" align="center"
                    style="border: 1px solid black;">
            </a>
        </div>
        <br><br>
        <div id="bar"></div>
        <!-- <div id="title">Qualitative Highlights:</div> -->
        <H3 style="text-align: center;"> Code</H3>
        <p>
        We provide [<a href="code/">code in the accompanied subfolder</a>]. 
        Please see [<a href="code/README.md">README.md</a>] for detailed instructions regarding the code.
        </p>
        <br><br>
        <div id="bar"></div>
        <!-- <div id="title">Qualitative Highlights:</div> -->
        <H3 style="text-align: center;"> Canonicalization</H3>
        <p>
            Below are example videos demonstrating the quality of canonicalization. Our method achieves more stable canonicalization compared to Compass -- shown by the chairs and airplanes being well-aligned despite appearance changes.
        </p>
        <table class="teaser_vid", style="width: 100%;">
            <tr>
                <td colspan="3">
                    <video width="100%" loop autoplay muted>
                        <source src="videos/combined_CAT4_10_compressed.mp4" type="video/mp4">
                    </video>
                </td>
                <td colspan="3">
                    <video width="100%" loop autoplay muted>
                        <source src="videos/combined_CAT9_3_compressed.mp4" type="video/mp4">
                    </video>
                </td>
            </tr>
            <tr>
                <td colspan="3">
                    <video width="100%" loop autoplay muted>
                        <source src="videos/combined_CAT4_10_compressed.mp4" type="video/mp4">
                    </video>
                </td>
                <td colspan="3">
                    <video width="100%" loop autoplay muted>
                        <source src="videos/combined_CAT9_12_compressed.mp4" type="video/mp4">
                    </video>
                </td>
            </tr>
            <tr>
                <td colspan="3">
                    <video width="100%" loop autoplay muted>
                        <source src="videos/combined_CAT4_15_compressed.mp4" type="video/mp4">
                    </video>
                </td>
                <td colspan="3">
                    <video width="100%" loop autoplay muted>
                        <source src="videos/combined_CAT9_7_compressed.mp4" type="video/mp4">
                    </video>
                </td>
            </tr>

            <tr>
                <td width=16.66%> Input </td>
                <td width=16.66%> Aligned by <i>Ours</i> </td>
                <td width=16.66%> Aligned by Compass </td>
                <td width=16.66%> Input </td>
                <td width=16.66%> Aligned by <i>Ours</i> </td>
                <td width=16.66%> Aligned by Compass </td>
            </tr>
        </table>

        <br><br>
        <div id="bar"></div>
        <!-- <div id="title">Qualitative Highlights:</div> -->
        <H3 style="text-align: center;"> Reconstruction and Decomposition</H3>
        <p> We show qualitative highlights, where we decompose 3D point
            clouds and auto-encode them using Canonical Capsules. We color each
            Canonical Capsule with a unique colour, and similarly color "patches"
            from the reconstruction heads of 3D-PointCapsNet and AtlasNetV2.
            Canonical Capsules provide semantically consistent decomposition that
            is aligned in the canonical frame, leading to improved reconstruction
            quality and unsupervised classification performance.
        </p>
    </div>

    <br><br><br>

    <table class="teaser_vid">
        <tr>
            <td colspan="6" style="background-color: #eeeeee; font-size: 20px;">
                Results with the single-category Canonical Capsules
            </td>
        </tr>
        <tr>
            <td colspan="6">
                <video width="100%" preload="auto" playsinline webkit-playsinline loop autoplay muted>
                    <source src="videos/single.mp4" type="video/mp4">
                </video>
            </td>
        </tr>
        <tr>
            <td width=16.66% style="font-size: 18px;"> Input </td>
            <td width=16.66% style="font-size: 18px;"> Decomposition </td>
            <td width=16.66% style="font-size: 18px;"> <i>Ours</i> reconstruction in canonical frame - not a still image! </td>
            <td width=16.66% style="font-size: 18px;"> <i>Ours</i> reconstruction in input frame</td>
            <td width=16.66% style="font-size: 18px;"> 3D-PointCapsNet reconstruction </td>
            <td width=16.66% style="font-size: 18px;"> AtlasNetV2 reconstruction </td>
        </tr>
    </table>
    <br>

    <table class="teaser_vid">
        <tr>
            <td colspan="6" style="background-color: #eeeeee; font-size: 20px;">
                Results with the multi-category Canonical Capsules
            </td>
        </tr>
        <tr>
            <td colspan="6">
                <video width="100%" loop autoplay muted>
                    <source src="videos/multiclass.mp4" type="video/mp4">
                </video>
            </td>
        </tr>
        <tr>
            <td width=16.66% style="font-size: 18px;"> Input </td>
            <td width=16.66% style="font-size: 18px;"> Decomposition </td>
            <td width=16.66% style="font-size: 18px;"> <i>Ours</i> reconstruction in canonical frame </td>
            <td width=16.66% style="font-size: 18px;"> <i>Ours</i> reconstruction in input frame</td>
            <td width=16.66% style="font-size: 18px;"> 3D-PointCapsNet reconstruction </td>
            <td width=16.66% style="font-size: 18px;"> AtlasNetV2 reconstruction </td>
        </tr>
    </table>
    <div class="container">
        <!-- <div id="bar"></div><br> -->
        <hr style="clear:both;" />
        <p style="text-align:center;">
            The supplementary videos are encoded by FFMPEG with h.264 codec. </br> If you can't play the video, please
            download the VLC player at: <a
                href="http://www.videolan.org/vlc/index.html">http://www.videolan.org/vlc/index.html</a>
        </p>
        <hr style="clear:both;" />
    </div>

</body>

</html>
