Eric Xing, Michael Jordan, Stuart J. Russell, Andrew Ng
Many algorithms rely critically on being given a good metric over their inputs. For instance, data can often be clustered in many “plausible” ways, and if a clustering algorithm such as K-means initially fails to ﬁnd one that is meaningful to a user, the only recourse may be for the user to manually tweak the metric until sufﬁciently good clusters are found. For these and other applications requiring good metrics, it is desirable that we provide a more systematic way for users to indicate what they con- sider “similar.” For instance, we may ask them to provide examples. In this paper, we present an algorithm that, given examples of similar (and, , learns a distance metric over if desired, dissimilar) pairs of points in that respects these relationships. Our method is based on posing met- ric learning as a convex optimization problem, which allows us to give efﬁcient, local-optima-free algorithms. We also demonstrate empirically that the learned metrics can be used to signiﬁcantly improve clustering performance.