Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
This paper proposes a mechanism for conditioning temporal convolutions on a sentence embedding in the context of aligning sentences with video segments. The reviewers agree that this is solid work with good experimental results. The novelty of the work appears limited to the context of the sentence grounding task, and as such is somewhat incremental. However the reviewers highlight the efficiency of the approach in terms of memory and computation, and feel the results will be of interest to vision and language researchers.