Chao Chen, Han Liu, Dimitris Metaxas, Tianqi Zhao
This paper studies the following problem: given samples from a high dimensional discrete distribution, we want to estimate the leading $(\delta,\rho)$-modes of the underlying distributions. A point is defined to be a $(\delta,\rho)$-mode if it is a local optimum of the density within a $\delta$-neighborhood under metric $\rho$. As we increase the ``scale'' parameter $\delta$, the neighborhood size increases and the total number of modes monotonically decreases. The sequence of the $(\delta,\rho)$-modes reveal intrinsic topographical information of the underlying distributions. Though the mode finding problem is generally intractable in high dimensions, this paper unveils that, if the distribution can be approximated well by a tree graphical model, mode characterization is significantly easier. An efficient algorithm with provable theoretical guarantees is proposed and is applied to applications like data analysis and multiple predictions.