TAP-Vid: A Benchmark for Tracking Any Point in a Video

DAVIS and Robotics Point Tracking

To show the meaning of our quantitative performance, we apply TAP-Net to representative examples from DAVIS using query points for which we have ground truth, and also to representative examples from RGB-Stacking. Overall, we see that TAP-Net performance is good enough to be potentially useful for downstream tasks, but that this remains far from a solved problem even though TAP-Net was the strongest performer among the algorithms we tested.

Qualitative validation of flow-based interpolation

When annotating videos, we interpolate between the sparse points that the annotators choose by finding tracks which minimize the discrepancy with the optical flow while still connecting the chosen points. To validate that this is indeed improving results, we annotated several Kubric videos twice, once using the flow-based interpolation, and again using a naive linear interpolation, which simply moves the point at a constant velocity between annotated points. In the paper, we compare the resulting points to ground truth quantitatively. For completeness, here we include the full set of Kubric videos used for this experiment.