Seminar Pole Algorithms and discrete structures Thematic team Distributed computing Manage talks Distributed computing Day, hour and place Tuesday at 2pm, room 1007 The calendar of events (iCal format). In order to add the event calendar to your favorite agenda, subscribe to the calendar by using this link. Contact(s) Pierre Fraigniaud Previous talks Year 2020 Distributed computing Thursday June 4, 2020, 2PM, Online Mikaël Rabie (LIP6, Sorbonne Université) Lower bounds for maximal matchings and maximal independent sets There are distributed graph algorithms for finding maximal matchings and maximal independent sets in O(Delta+log^*n) communication rounds; here n is the number of nodes and Delta is the maximum degree. The lower bound by Linial (1987, 1992) shows that the dependency on n is optimal: these problems cannot be solved in o(log^*n) rounds even if Delta=2. However, the dependency on Delta is a long-standing open question, and there is currently an exponential gap between the upper and lower bounds. We prove that the upper bounds are tight. In this presentation, I will first provide previous results on the bounds depending on n, before providing the tools and ideas we used for the case of Delta dependency. This is a joint work with Alkida Balliu, Sebastian Brandt, Juho Hirvonen, Dennis Olivetti, and Jukka Suomela. These results got the Best Paper award at FOCS 2019. Year 2019 Distributed computing Monday June 24, 2019, 11AM, Salle 3052 Carola Doerr (CNRS, LIP6 Sorbonne University) Evolutionary Algorithms – From Theory to Practice and Back Most real-world optimization problems do not have an explicit problem formulation, but only allow to compute the quality of selected solution candidates. Solving such black-box optimization problems requires iterative procedures which use the feedback gained from previous evaluations to determine the strategy by which the next solution candidates are generated. Many black-box optimization algorithms, such as Simulated Annealing, Evolutionary Algorithms, Swarm Intelligence Algorithms, are randomized – making it very difficult to analyze their performances mathematically. In the last 15 years, the theory of randomized black-box optimization has advanced considerably, and has contributed to efficient optimization by providing insights into the working principles of black-box optimization which are hard or impossible to obtain by empirical means. On the other hand, empirically-guided benchmarking has opened up new research directions for theoretical investigations. In this presentation we will discuss the state of the art in the theory of randomized black-box optimization algorithms. As part of this critical survey we will also mention a number of open questions and connections to other fields of Computer Science. Distributed computing Tuesday March 26, 2019, 2PM, Salle 3052 Anaïs Durand (LIP6, Sorbonne Université) Contention-related crash failures In the literature, two kinds of process crashes are usually considered: initially dead processes and “classical” crash failures (that can occur at any time). A new notion of process failure explicitly related to contention has recently been introduced by Gadi Taubenfeld (NETYS 2018). More precisely, given a predefined contention threshold λ, this notion considers the execution in which process crashes are restricted to occur only when process contention is smaller than or equal to λ. If crashes occur after contention bypassed λ, there are no correctness guarantees. It was shown that, when λ = n-1, consensus can be solved in an n-process asynchronous read/write system despite the crash of one process, thereby circumventing the well-known FLP impossibility result. In this talk, I will present k-set agreement and renaming algorithms that are resilient to two types of process crash failures: “λ-constrained” crash failures (as previously defined) and classical crash failures. Distributed computing Wednesday March 6, 2019, 11AM, Salle 3052 Marc Lelarge (Inria & ENS Paris) Spectral embedding for graph classification Learning on graphs requires a graph feature representation able to discriminate among different graphs while being amenable to fast computation. The graph isomorphism problem tells us that no fast representation of graphs is known if we require the representation to be both invariant to nodes permutation and able to discriminate two non-isomorphic graphs. Most graph representations explored so far require to be invariant. We explore new graph representations by relaxing this constraint. We present a generic embedding of graphs relying on spectral graph theory called Spectral Graph Embedding (SGE). We show that for a large family of graphs, our embedding is still invariant. To evaluate the quality and utility of our SGE, we apply them to the graph classification problem. Through extensive experiments, we show that a simple Random Forest based classification algorithm driven with our SGE reaches the state-of-the-art methods. Although our SGE is handcrafted, we also show how our generic embedding technique can be learned and built in a data-driven manner opening the way to new learning algorithms and deep learning architectures with some invariant constraints built-in. Distributed computing Tuesday March 5, 2019, 2PM, Salle 3052 Mikaël Rabie (IRIF) Distributed Reconfiguration of Graph Problems In Graph Theory, a reconfiguration problem is as following: given two solutions of a problem and a transition definition, is there a path of acceptable solutions from the first to the second using a transition one after another? What is the length of the shortest reconfiguration path? What complexities are involved? For example, a recoloration problem asks if we can go from a coloration to another, by changing the color of a node at each transition (with the coloring still valid in the intermediate steps). In this talk, the goal is to consider distributed version of two reconfiguration problems: recoloring, and reconfiguring independent sets. To parallelise the process, we will accept to change the state of different nodes at once, under certain hypothesis (for example, we will recolor an independent set of nodes). The questions will be, using the LOCAL model, how much communication is needed (i.e. how much a node needs knowledge of its neighborhood) in order to produce a reconfiguration schedule of a given length. For the distributed recoloring, we prove that the addition of colors for the intermediate steps is needed for some cases in order to have a solution. I will provide the analysis of trees and toroidal grids, where we want to go from a 3-coloring to another with the use of a 4th color. In the case of trees, a constant schedule can be found after O(log n) communications. In the case of grids, a linear number of communications is necessary. For the reconfiguration of independent sets, we will present different transitions from the centralized settings, to then justify the one we use. We prove that a constant schedule can be found after O(log*n) communications, and a linear schedule can be found after a constant number of communications. Those works are the result of collaborations with Marthe Bonamy, Keren Censor-Hillel, Paul Ouvrard, Jara Uitto and Jukka Suomela Distributed computing Tuesday February 19, 2019, 11AM, Salle 3052 Danupon Nanongkai (KTH) Distributed Shortest Paths, Exactly This talk concerns the problem of quickly computing distances and shortest paths on distributed networks (the CONGEST model). There have been many developments for this problem in the last few year, resulting in tight approximation schemes. This left open whether exact algorithms can perform equally well. In this talk, we will discuss some recent progress in answering this question. Most recent works that this talk is based on are with Sebastian Krinninger (FOCS 2018) and Aaron Bernstein (ArXiv 2018). Graphes Distributed computing Tuesday January 22, 2019, 2PM, Salle 3052 Guillaume Ducoffe (ICI Roumanie) Computing Giant Diameters with Breadth-First Search and Range Queries A well-known problem for which it is difficult to improve the textbook algorithm is computing the graph diameter. We present two versions of a simple algorithm (one being Monte Carlo and the other deterministic) that for every fixed $h$ and unweighted undirected graph $G$ with $n$ vertices and $m$ edges, either correctly concludes that $diam(G) < hn$ or outputs $diam(G)$, in time ${\cal O}(m+n^{1+o(1)})$. The algorithm combines a simple randomized strategy for this problem (Damaschke, {\it IWOCA'16}) with a popular framework for computing graph distances that is based on range trees (Cabello and Knauer, {\it Computational Geometry'09}). We also prove that under the Strong Exponential Time Hypothesis (SETH), we cannot compute the diameter of a given $n$-vertex graph in truly subquadratic time, even if the diameter is an $\Theta(n/\log{n})$. Year 2018 Distributed computing Tuesday October 23, 2018, 2PM, Salle 3052 Yllka Velaj (CWI Amsterdam) Stable Outcomes in Modified Fractional Hedonic Games In coalition formation games, self-organized coalitions are created as a result of the strategic interactions of independent agents. For each couple of agents $(i,j)$, weight $w_{i,j}=w_{j,i}$ reflects how much agents $i$ and $j$ benefit from belonging to the same coalition. We consider the modified fractional hedonic game, that is a coalition formation game in which agents' utilities are such that the total benefit of agent $i$ belonging to a coalition (given by the sum of $w_{i,j}$ over all other agents $j$ belonging to the same coalition) is averaged over all the other members of that coalition, i.e., excluding herself. Modified fractional hedonic games constitute a class of succinctly representable hedonic games. We are interested in the scenario in which agents, individually or jointly, choose to form a new coalition or to join an existing one, until a stable outcome is reached. To this aim, we consider common stability notions, leading to strong Nash stable outcomes, Nash stable outcomes or core stable outcomes: we study their existence, complexity and performance, both in the case of general weights and in the case of 0-1 weights. Distributed computing Tuesday July 10, 2018, 3PM, Salle 3052 Nicolas K. Blanchard (IRIF) Human-Computable Passwords Despite 25+ years of online experience and many innovations, passwords are still central to security, with all the risks and frustration they imply. Neither biometrics, nor password managers, nor systems like facebook connect can guarantee the security we require, and remembering dozens of different passwords becomes a usability nightmare. Following in the footsteps of Manuel Blum et al. we recently proposed a humanly-computable password scheme called Cue-Pin-Select. It can generate (and regenerate) passwords on the go using only the user's brain for computation. It has the advantage of creating nigh-unforgettable passwords, not requiring any external storage or computing device, and can be executed in less than a minute to create a new password (potentially down to 30s with a bit of training). All passwords generated are 52-bit secure and can maintain 40+ bit security even against adversaries in possession of a stolen password This talk will summarize work done in the past 6 months with Ted Selker and doesn't require any prior knowledge. It will start with the Cue-Pin-Select algorithm, cover an improvement we found that applies to all passphrase-based security systems, and explain some of the work currently being done to have better tools to study password schemes and human computation. The talk will be given in English unless the audience is entirely composed of French-speakers. Distributed computing Thursday July 5, 2018, 2PM, Salle 3052 Laurent Viennot (Inria Paris et IRIF) Introduction to the blockchain The collaborative construction of a blockchain is the core of Bitcoin protocol for allowing various participants to agree on a list of valid transactions. The talk will be very informal, the intent is to give a very basic introduction on how Bitcoin works and highlight some of the scientific challenges behind it. Distributed computing Friday May 4, 2018, 11AM, Salle 3052 Emanuele Natale (Max-Planck-Institut für Informatik) Pooling or Sampling: Collective Dynamics for Electrical Flow Estimation The computation of electrical flows is a crucial primitive for many recently proposed optimization algorithms on weighted networks. While typically implemented as a centralized subroutine, the ability to perform this task in a fully decentralized way is implicit in a number of biological systems. Thus, a natural question is whether this task can provably be accomplished in an efficient way by a network of agents executing a simple protocol. We provide a positive answer, proposing two distributed approaches to electrical flow computation on a weighted network: a deterministic process mimicking Jacobi's iterative method for solving linear systems, and a randomized token diffusion process, based on revisiting a classical random walk process on a graph with an absorbing node. We show that both processes converge to a solution of Kirchhoff's node potential equations, derive bounds on their convergence rates in terms of the weights of the network, and analyze their time and message complexity. Distributed computing Wednesday May 2, 2018, 2PM, Salle 3052 Mauro Sozio (Telecom Paristech) Fully Dynamic k-center Clustering and Graph Partitioning Recently, we have witnessed an explosion in the amount of data being readily available to us. This allows us to study and understand the world we live in with unprecedented details. However, the challenges of processing large amounts of data evolving over time are non-trivial. Our talk will start with discussing a few recent results in large-scale graph mining. In particular, we will briefly discuss the main ideas which allowed us to find dense subgraphs in graphs containing up to 20 billion edges. We will then continue our talk presenting a fully dynamic algorithm for k-center clustering, which is the problem of finding a partition of the input data points into k sets with minimum maximum radius. Most of the efforts in developing dynamic data mining algorithms have been focusing on the sliding window model (where at any given point in time only the most recent data items are retained) or more simplistic models. However, in many real-world applications one might need to deal with arbitrary deletions and insertions. For example, one might need to remove data items that are not necessarily the oldest ones, because they have been flagged as containing inappropriate content or due to privacy concerns. Clustering trajectory data might also require to deal with more general update operations. We develop a (2+epsilon)-approximation algorithm for the k-center clustering problem with “small” amortized cost under the fully dynamic adversarial model. In such a model, points can be added or removed arbitrarily, provided that the adversary does not have access to the random choices of our algorithm. The amortized cost of our algorithm is poly-logarithmic when the ratio between the maximum and minimum distance between any two points in input is bounded by a polynomial, while k and epsilon are constant. Our theoretical results are complemented with an extensive experimental evaluation on dynamic data from Twitter, Flickr, as well as trajectory data, demonstrating the effectiveness of our approach. We conclude our talk with some interesting directions for future work. In particular, we are going to discuss the challenges in developing a fully dynamic algorithm for graph partitioning. Seminaire du Pole Distributed computing Thursday March 22, 2018, 2PM, Salle 3052 Przemyslaw Uznanski (ETH Zurich) Searching in the presence of errors, revisited In searching with errors, the Questioner has to locate a target among $n$ objects, by repeatedly asking queries to Responder. The possible queries are limited by the structure of the searched space, and depending on the error model, the answers are subject to either adversarial lies (bound in absolute number, or bound in rate $r \in [0,1]$), or to probabilistic errors with rate $p \in [0,1]$. Our interest lies in distance-based searching in graphs and in comparison based searching of integer ranges. In graph searching with vertex queries, whenever Questioner queries a vertex $v$, Responder answers with neighbor $w$ of $v$ that lies on the shortest path from $v$ to target or with $v$ itself (as introduced by Emamjomeh-Zadeh et al. [STOC 2016]). We show that a simple strategy of querying weighted-1-median vertex and multiplicatively updating weights of vertices leads to $O(\log n + L)$ queries with fixed $L$ lies, generalizing results of Emamjomeh-Zadeh et al. By careful selection of potential this leads to $O(\log n / \varepsilon^2)$ queries when number of lies is bounded linearly by ratio $r = 1/2-\varepsilon$, or are subject to errors with probability $p = 1/2-\varepsilon$ (in that case target is located correctly w.h.p.). This improves result of Emamjomeh-Zadeh et al. for some selection of parameters (locating w.h.p. when $\varepsilon = \Omega(1/\sqrt{\log n})$, and generalizes their result to other error models. We introduce an edge queries model, in which, whenever Questioner queries an edge $e$, Responder answers with endpoint of $e$ that is not further to target. We show that appropriately defining weighted-1-median edge leads to $O(\Delta(\log n + L))$ length strategy in graphs of max degree $\Delta$, which leads to $O(\Delta \log n / \varepsilon^2)$ queries when lies are bounded linearly with rate $r = \frac{1}{\Delta+1}(1-\varepsilon)$ and $O(\Delta \log^2 n/\varepsilon^2)$ with error probability $p = 1/2-\varepsilon$. Our results for edge queries almost automatically apply to comparison based search of integer ranges, simplifying and improving many existing results. We also note that vertex based queries translate to (not considered previously to our knowledge) ternary comparisons, and this shows a separation between power of binary and ternary comparisons in the linearly bounded lies (Questioner can find target iff $r < \frac13$ for former case, $r < \frac12$ for latter case). Tweaking initial distribution of potentials, our results apply to searching in unbounded integer domains as well, leading to both simplification and significant improvement in length of strategies, and providing first results for many combinations of query model and error model. Distributed computing Tuesday March 20, 2018, 2PM, Salle 1007 Pierluigi Crescenzi (Universite de Pise) Computing node centrality in large graphs: from theory to practice and back The computation of several graph measures, based on the distance between nodes, is very often part of the analysis of real-world complex networks. The diameter, the betweenness and the closeness centrality, and the hyperbolicity are typical examples of such measures. In this talk, we will focus on the computation of one specific measure, that is, the node closeness centrality which is basically the inverse of the average distance of a node from all other nodes of the graph. Even though polynomial-time algorithms are available for the computation of this measure, in practice these algorithms are not useful, due to the huge size of the networks to be analysed. One first theoretical question is, hence, whether better algorithms can be designed, whose worst-case complexity is (almost) linear in the size of the input graph. We will first show that, unfortunately, for the closeness centrality no such algorithm exists (under reasonable complexity theory assumptions). This will lead us back to the practical point of view: we will then describe a heuristics that allows us to compute the above measure in (practical) linear time, even though its worst-case complexity is (in practice) intractable. This result will finally motivate our return to theory in order to understand the reason why, in practice, this heuristics works so well: we will indeed conclude by showing that, in the case of several random graph generating models, the average time complexity of the heuristics is indeed (almost) linear. This talk will summarise the research work that I have done in collaboration with Elisabetta Bergamini, Michele Borassi, Michel Habib, Andrea Marino, Henning Meyerhenke, and Luca Trevisan, and will be mostly based on two papers presented at ALENEX 2016, and SODA 2017. Distributed computing Tuesday February 13, 2018, 2PM, Salle 1007 Nicolas Behr (ENS) Combinatorics of stochastic rewriting systems Based on recent joint work with G.H.E. Duchamp and K.A. Penson (arXiv:1712.06575, on the combinatorics of chemical reaction systems) and with V. Danos and I. Garnier (in preparation), I will explain how dynamical and statistical aspects of stochastic graph rewriting systems may be computed using combinatorial techniques. The main part of the talk will focus on the special case of discrete graph rewriting (i.e. chemical reactions) for clarity and brevity, but time permitting the generalization to generic graph rewriting in the form of a stochastic mechanics formalism based on the concept of rule algebras will be discussed as well. Distributed computing Tuesday January 16, 2018, 2PM, Salle 3014 Nicolas Thiery (Université Paris Sud) Computing huge subspaces of diagonal harmonic polynomials: symmetries to the rescue! Last spring, I visited François Bergeron and we worked on his favorite objects: the spaces H(n,k) of diagonal harmonic polynomials in k sets of n variables. Those spaces connect to many areas of algebraic combinatorics, representation theory and beyond, and the case H(n,2) became famous a decade ago with the n! conjecture and its proof by Haiman. To fuel his ongoing studies François needed to compute the structure of H(5,6). This is a space of dimension 6.10^5 made of polynomials in 30 variables of degree up to 15, each having thousands of terms. In this talk, I'll explain how the calculation can now be completed in 45 minutes with a dozen cores and ~15Go of memory. This exploits a combination of strategies (symmetries, representation theory of the symmetric and general linear group, …), each of which reduces the complexity in time and memory by one or two orders of magnitude. There will be little prerequisites and it's my hope that some strategies (and maybe the code!) could be used in other contexts. Year 2017 Distributed computing Tuesday December 12, 2017, 2PM, Salle 3052 Jean Krivine (IRIF) Incremental Update for Graph Rewriting Graph rewriting formalisms are well-established models for the representation of biological systems such as protein-protein interaction networks. The combinatorial complexity of these models usually prevents any explicit representation of the variables of the system, and one has to rely on stochastic simulations in order to sample the possible trajectories of the underlying Markov chain. The bottleneck of stochastic simulation algorithms is the update of the propensity function that describes the probability that a given rule is to be applied next. In this talk we present an algorithm based on a data structure, called extension basis, that can be used to update the counts of predefined graph observables after a rule of the model has been applied. Reference: Boutillier P., Ehrhard T., Krivine J. (2017) Incremental Update for Graph Rewriting. In: Yang H. (eds) Programming Languages and Systems. ESOP 2017. Lecture Notes in Computer Science, vol 10201. Springer, Berlin, Heidelberg Distributed computing Tuesday November 14, 2017, 2PM, Salle 3052 Laurent Massoulié (MSR-Inria) Rapid Mixing of Local Graph Dynamics Graph dynamics arise naturally in many contexts. For instance in peer-to-peer networks, a participating peer may replace an existing connection with one neighbour by a new connection with a neighbour's neighbour. Several such local rewiring rules have been proposed to ensure that peer-to-peer networks achieve good connectivity properties (e.g. high expansion) in equilibrium. However it has remained an open question whether there existed such rules that also led to fast convergence to equilibrium. In this work we provide an affirmative answer: We exhibit a local rewiring rule that converges to equilibrium after each participating node has undergone only a number of rewirings that is poly-logarithmic in the system size. The proof involves consideration of the whole isoperimetric profile of the graph, and may be of independent interest. This is joint work with Rémi Varloot. Séminaire de Pôle Distributed computing Tuesday October 17, 2017, 2PM, Salle 3052 Claire Mathieu (DI - ENS) Online k-compaction Given, at each time t = 1, 2, …, n, a new file of length l(t) and a read rate r(t), an online k-compaction algorithm must maintain a collection of at most k files, choosing (at each time t, without knowing future inputs) some of the files to merge into one, thereby incurring a merge cost equal to the total length of the merged files and a read cost equal to the read rate r(t) times the number of files present at time t. The goal is to minimize the total cost over time. K-compaction algorithms are a key component of log-structured merge trees, the file-based data structure underlying NoSQL databases such as Accumulo, Bigtable, Cassandra, HBase,and others. We initiate the theoretical study of k-compaction algorithms. We formalize the problem, consider worst-case, average-case and competitive analysis (per-instance optimality), and propose new algorithms that are optimal according to these metrics. This is joint work with Carl Staelin, Neal E. Young, and Arman Yousefi.