Part of Advances in Neural Information Processing Systems 17 (NIPS 2004)
Richard Zemel, Miguel Carreira-Perpiñán
Many machine learning algorithms for clustering or dimensionality re- duction take as input a cloud of points in Euclidean space, and construct a graph with the input data points as vertices. This graph is then parti- tioned (clustering) or used to redeﬁne metric information (dimensional- ity reduction). There has been much recent work on new methods for graph-based clustering and dimensionality reduction, but not much on constructing the graph itself. Graphs typically used include the fully- connected graph, a local ﬁxed-grid graph (for image segmentation) or a nearest-neighbor graph. We suggest that the graph should adapt locally to the structure of the data. This can be achieved by a graph ensemble that combines multiple minimum spanning trees, each ﬁt to a perturbed version of the data set. We show that such a graph ensemble usually pro- duces a better representation of the data manifold than standard methods; and that it provides robustness to a subsequent clustering or dimension- ality reduction algorithm based on the graph.