Thematic team Complex Systems, Networks, and Distributed Computing

INRIA project-team GANG

## Complex systems

#### Day, hour and place

Tuesday at 2pm, room 1007

#### Contact(s)

#### Future talks

Systèmes complexes

jeudi 05 juillet 2018, 14h00, Salle 3052

**Laurent Viennot** (Inria Paris et IRIF) *Introduction to the blockchain*

The collaborative construction of a blockchain is the core of Bitcoin protocol for allowing various participants to agree on a list of valid transactions. The talk will be very informal, the intent is to give a very basic introduction on how Bitcoin works and highlight some of the scientific challenges behind it.

#### Past talks

Systèmes complexes

vendredi 04 mai 2018, 11h00, Salle 3052

**Emanuele Natale** (Max-Planck-Institut für Informatik) *Pooling or Sampling: Collective Dynamics for Electrical Flow Estimation*

The computation of electrical flows is a crucial primitive for many recently proposed optimization algorithms on weighted networks. While typically implemented as a centralized subroutine, the ability to perform this task in a fully decentralized way is implicit in a number of biological systems. Thus, a natural question is whether this task can provably be accomplished in an efficient way by a network of agents executing a simple protocol. We provide a positive answer, proposing two distributed approaches to electrical flow computation on a weighted network: a deterministic process mimicking Jacobi's iterative method for solving linear systems, and a randomized token diffusion process, based on revisiting a classical random walk process on a graph with an absorbing node. We show that both processes converge to a solution of Kirchhoff's node potential equations, derive bounds on their convergence rates in terms of the weights of the network, and analyze their time and message complexity.

Systèmes complexes

mercredi 02 mai 2018, 14h00, Salle 3052

**Mauro Sozio** (Telecom Paristech) *Fully Dynamic k-center Clustering and Graph Partitioning*

Recently, we have witnessed an explosion in the amount of data being readily available to us. This allows us to study and understand the world we live in with unprecedented details. However, the challenges of processing large amounts of data evolving over time are non-trivial. Our talk will start with discussing a few recent results in large-scale graph mining. In particular, we will briefly discuss the main ideas which allowed us to find dense subgraphs in graphs containing up to 20 billion edges. We will then continue our talk presenting a fully dynamic algorithm for k-center clustering, which is the problem of finding a partition of the input data points into k sets with minimum maximum radius. Most of the efforts in developing dynamic data mining algorithms have been focusing on the sliding window model (where at any given point in time only the most recent data items are retained) or more simplistic models. However, in many real-world applications one might need to deal with arbitrary deletions and insertions. For example, one might need to remove data items that are not necessarily the oldest ones, because they have been flagged as containing inappropriate content or due to privacy concerns. Clustering trajectory data might also require to deal with more general update operations.

We develop a (2+epsilon)-approximation algorithm for the k-center clustering problem with “small” amortized cost under the fully dynamic adversarial model. In such a model, points can be added or removed arbitrarily, provided that the adversary does not have access to the random choices of our algorithm. The amortized cost of our algorithm is poly-logarithmic when the ratio between the maximum and minimum distance between any two points in input is bounded by a polynomial, while k and epsilon are constant. Our theoretical results are complemented with an extensive experimental evaluation on dynamic data from Twitter, Flickr, as well as trajectory data, demonstrating the effectiveness of our approach. We conclude our talk with some interesting directions for future work. In particular, we are going to discuss the challenges in developing a fully dynamic algorithm for graph partitioning.

Systèmes complexes

jeudi 22 mars 2018, 14h00, Salle 3052

**Przemyslaw Uznanski** (ETH Zurich) *Searching in the presence of errors, revisited*

In searching with errors, the Questioner has to locate a target among $n$ objects, by repeatedly asking queries to Responder. The possible queries are limited by the structure of the searched space, and depending on the error model, the answers are subject to either adversarial lies (bound in absolute number, or bound in rate $r \in [0,1]$), or to probabilistic errors with rate $p \in [0,1]$. Our interest lies in distance-based searching in graphs and in comparison based searching of integer ranges.

In graph searching with vertex queries, whenever Questioner queries a vertex $v$, Responder answers with neighbor $w$ of $v$ that lies on the shortest path from $v$ to target or with $v$ itself (as introduced by Emamjomeh-Zadeh et al. [STOC 2016]). We show that a simple strategy of querying weighted-1-median vertex and multiplicatively updating weights of vertices leads to $O(\log n + L)$ queries with fixed $L$ lies, generalizing results of Emamjomeh-Zadeh et al. By careful selection of potential this leads to $O(\log n / \varepsilon^2)$ queries when number of lies is bounded linearly by ratio $r = 1/2-\varepsilon$, or are subject to errors with probability $p = 1/2-\varepsilon$ (in that case target is located correctly w.h.p.). This improves result of Emamjomeh-Zadeh et al. for some selection of parameters (locating w.h.p. when $\varepsilon = \Omega(1/\sqrt{\log n})$, and generalizes their result to other error models.

We introduce an edge queries model, in which, whenever Questioner queries an edge $e$, Responder answers with endpoint of $e$ that is not further to target. We show that appropriately defining weighted-1-median edge leads to $O(\Delta(\log n + L))$ length strategy in graphs of max degree $\Delta$, which leads to $O(\Delta \log n / \varepsilon^2)$ queries when lies are bounded linearly with rate $r = \frac{1}{\Delta+1}(1-\varepsilon)$ and $O(\Delta \log^2 n/\varepsilon^2)$ with error probability $p = 1/2-\varepsilon$.

Our results for edge queries almost automatically apply to comparison based search of integer ranges, simplifying and improving many existing results. We also note that vertex based queries translate to (not considered previously to our knowledge) ternary comparisons, and this shows a separation between power of binary and ternary comparisons in the linearly bounded lies (Questioner can find target iff $r < \frac13$ for former case, $r < \frac12$ for latter case).

Tweaking initial distribution of potentials, our results apply to searching in unbounded integer domains as well, leading to both simplification and significant improvement in length of strategies, and providing first results for many combinations of query model and error model.

Systèmes complexes

mardi 20 mars 2018, 14h00, Salle 1007

**Pierluigi Crescenzi** (Universite de Pise) *Computing node centrality in large graphs: from theory to practice and back*

The computation of several graph measures, based on the distance between nodes, is very often part of the analysis of real-world complex networks. The diameter, the betweenness and the closeness centrality, and the hyperbolicity are typical examples of such measures. In this talk, we will focus on the computation of one specific measure, that is, the node closeness centrality which is basically the inverse of the average distance of a node from all other nodes of the graph. Even though polynomial-time algorithms are available for the computation of this measure, in practice these algorithms are not useful, due to the huge size of the networks to be analysed. One first theoretical question is, hence, whether better algorithms can be designed, whose worst-case complexity is (almost) linear in the size of the input graph. We will first show that, unfortunately, for the closeness centrality no such algorithm exists (under reasonable complexity theory assumptions). This will lead us back to the practical point of view: we will then describe a heuristics that allows us to compute the above measure in (practical) linear time, even though its worst-case complexity is (in practice) intractable. This result will finally motivate our return to theory in order to understand the reason why, in practice, this heuristics works so well: we will indeed conclude by showing that, in the case of several random graph generating models, the average time complexity of the heuristics is indeed (almost) linear. This talk will summarise the research work that I have done in collaboration with Elisabetta Bergamini, Michele Borassi, Michel Habib, Andrea Marino, Henning Meyerhenke, and Luca Trevisan, and will be mostly based on two papers presented at ALENEX 2016, and SODA 2017.

Systèmes complexes

mardi 13 février 2018, 14h00, Salle 1007

**Nicolas Behr** (ENS) *Combinatorics of stochastic rewriting systems*

Based on recent joint work with G.H.E. Duchamp and K.A. Penson (arXiv:1712.06575, on the combinatorics of chemical reaction systems) and with V. Danos and I. Garnier (in preparation), I will explain how dynamical and statistical aspects of stochastic graph rewriting systems may be computed using combinatorial techniques. The main part of the talk will focus on the special case of discrete graph rewriting (i.e. chemical reactions) for clarity and brevity, but time permitting the generalization to generic graph rewriting in the form of a stochastic mechanics formalism based on the concept of rule algebras will be discussed as well.

Systèmes complexes

mardi 16 janvier 2018, 14h00, Salle 3014

**Nicolas Thiery** (Université Paris Sud) *Computing huge subspaces of diagonal harmonic polynomials: symmetries to the rescue!*

Last spring, I visited François Bergeron and we worked on his favorite objects: the spaces H(n,k) of diagonal harmonic polynomials in k sets of n variables. Those spaces connect to many areas of algebraic combinatorics, representation theory and beyond, and the case H(n,2) became famous a decade ago with the n! conjecture and its proof by Haiman.

To fuel his ongoing studies François needed to compute the structure of H(5,6). This is a space of dimension 6.10^5 made of polynomials in 30 variables of degree up to 15, each having thousands of terms.

In this talk, I'll explain how the calculation can now be completed in 45 minutes with a dozen cores and ~15Go of memory. This exploits a combination of strategies (symmetries, representation theory of the symmetric and general linear group, …), each of which reduces the complexity in time and memory by one or two orders of magnitude.

There will be little prerequisites and it's my hope that some strategies (and maybe the code!) could be used in other contexts.

Systèmes complexes

mardi 12 décembre 2017, 14h00, Salle 3052

**Jean Krivine** (IRIF) *Incremental Update for Graph Rewriting*

Graph rewriting formalisms are well-established models for the representation of biological systems such as protein-protein interaction networks. The combinatorial complexity of these models usually prevents any explicit representation of the variables of the system, and one has to rely on stochastic simulations in order to sample the possible trajectories of the underlying Markov chain. The bottleneck of stochastic simulation algorithms is the update of the propensity function that describes the probability that a given rule is to be applied next. In this talk we present an algorithm based on a data structure, called extension basis, that can be used to update the counts of predefined graph observables after a rule of the model has been applied.

Reference: Boutillier P., Ehrhard T., Krivine J. (2017) Incremental Update for Graph Rewriting. In: Yang H. (eds) Programming Languages and Systems. ESOP 2017. Lecture Notes in Computer Science, vol 10201. Springer, Berlin, Heidelberg

Systèmes complexes

mardi 14 novembre 2017, 14h00, Salle 3052

**Laurent Massoulié** (MSR-Inria) *Rapid Mixing of Local Graph Dynamics*

Graph dynamics arise naturally in many contexts. For instance in peer-to-peer networks, a participating peer may replace an existing connection with one neighbour by a new connection with a neighbour's neighbour. Several such local rewiring rules have been proposed to ensure that peer-to-peer networks achieve good connectivity properties (e.g. high expansion) in equilibrium. However it has remained an open question whether there existed such rules that also led to fast convergence to equilibrium. In this work we provide an affirmative answer: We exhibit a local rewiring rule that converges to equilibrium after each participating node has undergone only a number of rewirings that is poly-logarithmic in the system size. The proof involves consideration of the whole isoperimetric profile of the graph, and may be of independent interest.

This is joint work with Rémi Varloot.

Systèmes complexes

mardi 17 octobre 2017, 14h00, Salle 3052

**Claire Mathieu** (DI - ENS) *Online k-compaction*

Given, at each time t = 1, 2, …, n, a new file of length l(t) and a read rate r(t), an online k-compaction algorithm must maintain a collection of at most k files, choosing (at each time t, without knowing future inputs) some of the files to merge into one, thereby incurring a merge cost equal to the total length of the merged files and a read cost equal to the read rate r(t) times the number of files present at time t. The goal is to minimize the total cost over time. K-compaction algorithms are a key component of log-structured merge trees, the file-based data structure underlying NoSQL databases such as Accumulo, Bigtable, Cassandra, HBase,and others. We initiate the theoretical study of k-compaction algorithms. We formalize the problem, consider worst-case, average-case and competitive analysis (per-instance optimality), and propose new algorithms that are optimal according to these metrics.

This is joint work with Carl Staelin, Neal E. Young, and Arman Yousefi.

#### Former seminar