Part of Advances in Neural Information Processing Systems 33 (NeurIPS 2020)

*Vincent Cohen-Addad, Adrian Kosowski, Frederik Mallmann-Trenn, David Saulpic*

A classic problem in machine learning and data analysis is to partition the vertices of a network in such a way that vertices in the same set are densely connected and vertices in different sets are loosely connected.

In practice, the most popular approaches rely on local search algorithms; not only for the ease of implementation and the efficiency, but also because of the accuracy of these methods on many real world graphs. For example, the Louvain algorithm -- a local search based algorithm -- has quickly become the method of choice for clustering in social networks. However, explaining the success of these methods remains an open problem: in the worst-case, the runtime can be up to \Omega(n^2), much worse than what is typically observed in practice, and no guarantee on the quality of its output can be established.

The goal of this paper is to shed light on the inner-workings of Louvain; only if we understand Louvain, can we rely on it and further improve it. To achieve this goal, we study the behavior of Louvain in the famous two-bloc Stochastic Block Model, which has a clear ground-truth and serves as the standard testbed for graph clustering algorithms. We provide valuable tools for the analysis of Louvain, but also for many other combinatorial algorithms. For example, we show that the probability for a node to have more edges towards its own community is 1/2 + \Omega( \min( \Delta(p-q)/\sqrt{np},1 )) in the SBM(n,p,q), where \Delta is the imbalance. Note that this bound is asymptotically tight and useful for the analysis of a wide range of algorithms (Louvain, Kernighan-Lin, Simulated Annealing etc).

Do not remove: This comment is monitored to verify that the site is working properly