\begin{abstract}

We introduce a novel active learning framework for video annotation. By
judiciously choosing which frames a user should annotate, we can obtain highly
accurate tracks with minimal user effort.  We cast this problem as one of
active learning, and show that we can obtain excellent performance by querying
frames that, if annotated, would produce a large expected change in the
estimated object track. We implement a constrained tracker and compute the
expected change for putative annotations with efficient dynamic programming
algorithms.  We demonstrate our framework on four datasets, including two
benchmark datasets constructed with key frame annotations obtained by Amazon
Mechanical Turk. Our results indicate that we could obtain equivalent labels
for a small fraction of the original cost.

\end{abstract}
