Tamara Berg, Alexander Berg, Jaety Edwards, David Forsyth
The context in which a name appears in a caption provides powerful cues as to who is depicted in the associated image. We obtain 44,773 face im- ages, using a face detector, from approximately half a million captioned news images and automatically link names, obtained using a named en- tity recognizer, with these faces. A simple clustering method can pro- duce fair results. We improve these results significantly by combining the clustering process with a model of the probability that an individual is depicted given its context. Once the labeling procedure is over, we have an accurately labeled set of faces, an appearance model for each individual depicted, and a natural language model that can produce ac- curate results on captions in isolation.