We present a simple distributed algorithm that, given a regular graph consisting of two communities (or clusters), each inducing a good expander and such that the cut between them has sparsity $1/\mbox{polylog}(n)$, recovers the two communities.

More precisely, upon running the protocol, every node assigns itself a binary label of $m = \Theta(\log n)$ bits, so that with high probability, for all but a small number of outliers, nodes within the same community are assigned labels with Hamming distance $o(m)$, while nodes belonging to different communities receive labels with Hamming distance at least $m/2 - o(m)$. We refer to such an outcome as a “community sensitive labeling” of the graph.

Our algorithm uses $\Theta(\log^2 n)$ local memory and computes the community sensitive labeling after each node performs $\Theta(\log^2 n)$ steps of local work. Our algorithm and its analysis work in the “(random) population protocol” model, in which anonymous nodes do not share any global clock (the model is asynchronous) and communication occurs over one single (random) edge per round. We believe, this is the first provably-effective protocol for community detection that works in this model.

Joint work with Luca Becchetti, Andrea Clementi, Francesco Pasquale, Prasad Raghavendra, and Luca Trevisan.

Link to the arxiv version: https://arxiv.org/abs/1703.05045