Kaustav Kundu, Joseph Tighe
As classifications datasets progressively get larger in terms of label space and number of examples, annotating them with all labels becomes non-trivial and expensive task. For example, annotating the entire OpenImage test set can cost $6.5M. Hence, in current large-scale benchmarks such as OpenImages and LVIS, less than 1\% of the labels are annotated across all images. Standard classification models are trained in a manner where these un-annotated labels are ignored. Ignoring these un-annotated labels result in loss of supervisory signal which reduces the performance of the classification models. Instead, in this paper, we exploit relationships among images and labels to derive more supervisory signal from the un-annotated labels. We study the effectiveness of our approach across several multi-label computer vision benchmarks, such as CIFAR100, MS-COCO panoptic segmentation, OpenImage and LVIS datasets. Our approach can outperform baselines by a margin of 2-10% across all the datasets on mean average precision (mAP) and mean F1 metrics.