seminaires:these:these2018

Soutenances de thèses
Vendredi 7 décembre 2018, 14 heures, Salle François Jacob, bâtiment Buffon
Pierre Cagne (IRIF) Towards a homotopical algebra of dependent types

http://www.normalesup.org/~cagne/soutenance.html

Soutenances de thèses
Vendredi 30 novembre 2018, 14 heures 30, Salle 580F, Bâtiment Halle aux Farines
Lucas Boczkowski (IRIF) Search and Broadcast in Stochastic Environments, a Biological Perspective

This thesis is built around two series of works, each motivated by experiments on ants. We derive and analyse new models that use computer science concepts and methodology, despite their biological roots and motivation.

The first model studied in this thesis takes its inspiration in collaborative transport of food in the P. Longicornis ant species. We find that some key aspects of the process are well described by a graph search problem with noisy advice. The advice corresponds to characteristic short scent marks laid in front of the load in order to facilitate its navigation. In this thesis, we provide detailed analysis of the model on trees, which are relevant graph structures from a computer science standpoint. In particular our model may be viewed as a noisy extension of binary search to trees. Tight results in expectation and high probability are derived with matching upper and lower bounds. Interestingly, there is a sharp phase transition phenomenon for the expected runtime, but not when the algorithms are only required to succeed with high probability.

The second model we work with was initially designed to capture information broadcast amongst desert ants. The model uses a stochastic meeting pattern and noise in the interactions, in a way that matches experimental data. Within this theoretical model, we present in this document a strong lower bound on the number of interactions required before information can be spread reliably. Experimentally, we see that the time required for the recruitment process of even few ants increases sharply with the group size, in accordance with our result. A theoretical consequence of the lower bound is a separation between the uniform noisy PUSH and PULL models of interaction. We also study a close variant of broadcast, without noise this time but under more strict convergence requirements and show that in this case, the problem can be solved efficiently, even with very limited exchange of information on each interaction.

Soutenances de thèses
Vendredi 23 novembre 2018, 14 heures, Laboratoire MAP5, 45 rue des Saint-pères, 7eme étage, salle du conseil
Léo Planche (IRIF) Décomposition de graphes en plus courts chemins et en cycles de faible excentricité

En collaboration avec des chercheurs en biologie à Jussieu, nous étudions des graphes issus de données biologiques afin de d'en améliorer la compréhension. Ces graphes sont constitués à partir de fragments d'ADN, nommés reads. Chaque read correspond à un sommet, et deux sommets sont reliés si les deux séquences d'ADN correspondantes ont un taux de similarité suffisant. Ainsi se forme des graphes ayant une structure bien particulière que nous nommons hub-laminaire. Un graphe est dit hub-laminaire s'il peut être résumé en quelques plus courts chemins dont tous les sommets du graphe soient proche. Nous étudions en détail le cas où le graphe est composé d'un unique plus court chemin d'excentricité faible. Nous améliorons la preuve d'un algorithme d'approximation déjà existant et en proposons un nouveau, effectuant une 3-approximation en temps linéaire. De plus, nous analysons le lien avec le problème de k-laminarité défini par Michel Habib et Finn Völkel, ce dernier consistant en la recherche d'un diamètre de faible excentricité. Nous étudions ensuite le problème du cycle isométrique de plus faible excentricité. Nous montrons que ce problème est NP-complet et proposons deux algorithmes d'approximations. Nous définissons ensuite précisément la structure “hub-laminaire” et présentons un algorithme d'approximation en temps O(nm). Nous confrontons cet algorithme à des graphes générés par une procédure aléatoire et l'appliquons à nos données biologiques. Pour finir nous montrons que le calcul du cycle isométrique d'excentricité minimale permet le plongement d'un graphe dans un cercle avec une distorsion multiplicative faible. Le calcul d'une décomposition hub-laminaire permet quant à lui une représentation compacte des distances avec une distorsion additive bornée.

Soutenances de thèses
Vendredi 19 octobre 2018, 9 heures 30, Salle 580F (salle des thèses), Bâtiment Halle aux Farines
Marie Kerjean (IRIF) Reflexive spaces of smooth functions: a logical account for linear partial differential equations

Around the Curry-Howard correspondence, proof-theory has grown along two distinct fields: the theory of programming languages, for which formulas acts as data types, and the semantic study of proofs. The latter consists in giving mathematical models of proofs and programs. In particular, denotational semantics distinguishes data types which serves as input or output of programs, and allows in return for a finer understanding of proofs and programs. Linear Logic (LL) gives a logical interpretation of the basic notions of linear algebra, while Differential Linear Logic allows for a logical understanding of differentiation.

This manuscript strengthens the link between proof-theory and functional analysis, and highlights the role of linear involutive negation in DiLL. The first part of this thesis consists in an overview of prerequisites on the notions of linearity, polarisation and differentiation in proof-theory, and gives the necessary background in the theory of locally convex topological vector spaces. The second part uses two standard topologies on the dual of a topological vector space and gives two models of DiLL: the weak topology allows only for a discrete interpretation of proofs through formal power series, while the Mackey topology on the dual allows for a smooth and polarised model of DiLL. Finally, the third part interprets proofs of DiLL by distributions. We detail a polarized model of DiLL in which negatives are Fr\'echet Nuclear spaces, and proofs are distributions with compact support. We show that solving linear partial differential equations with constant coefficients can be typed by a syntax similar to the one of DiLL, which we detail.

Soutenances de thèses
Jeudi 27 septembre 2018, 15 heures 30, Salle 470E, Bâtiment Halle aux Farines
Pablo Rotondo (IRIF) Probabilistic studies in Number Theory and Word Combinatorics: instances of dynamical analysis

Dynamical Analysis incorporates tools from dynamical systems, namely the Transfer Operator, into the framework of Analytic Combinatorics, permitting the analysis of numerous algorithms and objects naturally associated with an underlying dynamical system. This dissertation presents, in the integrated framework of Dynamical Analysis, the probabilistic analysis of seemingly distinct problems in a unified way: the probabilistic study of the recurrence function of Sturmian words, and the probabilistic study of the Continued Logarithm algorithm.

Sturmian words are a fundamental family of words in Word Combinatorics. They are in a precise sense the simplest infinite words that are not eventually periodic. Sturmian words have been well studied over the years, notably by Morse and Hedlund (1940) who demonstrated that they present a notable number theoretical characterization as discrete codings of lines with irrational slope, relating them naturally to dynamical systems, in particular the Euclidean dynamical system. These words have never been studied from a probabilistic perspective. Here, we quantify the recurrence properties of a “random” Sturmian word, which are dictated by the so-called “recurrence function”; we perform a complete asymptotic probabilistic study of this function, quantifying its mean and describing its distribution under two different probabilistic models, which present different virtues: one is a naturaly choice from an algorithmic point of view (but is innovative from the point of view of dynamical analysis), while the other allows a natural quantification of the worst-case growth of the recurrence function. We discuss the relation between these two distinct models and their respective techniques, explaining also how the two seemingly different techniques employed could be linked through the use of the Mellin transform. In this dissertation we also discuss our ongoing work regarding two special families of Sturmian words: those associated with a quadratic irrational slope, and those with a rational slope (not properly Sturmian). Our work seems to show the possibility of a unified study.

The Continued Logarithm Algorithm, introduced by Gosper in Hakmem (1978) as a mutation of classical continued fractions, computes the greatest common divisor of two natural numbers by performing division-like steps involving only binary shifts and substractions. Its worst-case performance was studied recently by Shallit (2016), who showed a precise upper-bound for the number of steps and gave a family of inputs attaining this bound. In this dissertation we employ dynamical analysis to study the average running time of the algorithm, giving precise mathematical constants for the asymptotics, as well as other parameters of interest. The underlying dynamical system is akin to the Euclidean one, and was first studied by Chan (around 2005) from an ergodic, but the presence of powers of 2 in the quotients ingrains into the central parameters a dyadic flavour that cannot be grasped solely by studying this system. We thus introduce a dyadic component and deal with a two-component system. With this new mixed system at hand, we then provide a complete average-case analysis of the algorithm by Dynamical Analysis.

Soutenances de thèses
Mercredi 26 septembre 2018, 14 heures, Bâtiment Sophie Germain
Vitalii Aksenov (IRIF) Synchronization costs in parallel programs and concurrent data structures

To use the computational power of modern computing machines, we have to deal with concurrent programs. Writing efficient concurrent programs is notoriously difficult, primarily due to the need of harnessing synchronization costs. In this thesis, we focus on synchronization costs in parallel programs and concurrent data structures.First, we present a novel granularity control technique for parallel programs designed for the dynamic multithreading environment. Then in the context of concurrent data structures, we consider the notion of concurrency-optimality and propose the first implementation of a concurrency-optimal binary search tree that, intuitively, accepts a concurrent schedule if and only if the schedule is correct. Also, we propose parallel combining, a technique that enables efficient implementations of concurrent data structures from their parallel batched counterparts. We validate the proposed techniques via experimental evaluations showing superior or comparable performance with respect to state-of-the-art algorithms.From a more formal perspective, we consider the phenomenon of helping in concurrent data structures. Intuitively, helping is observed when the order of some operation in a linearization is fixed by a step of another process. We show that no wait-free linearizable implementation of stack using read, write, compare&swap and fetch&add primitives can be help-free, correcting a mistake in an earlier proof by Censor-Hillel et al. Finally, we propose a simple way to analytically predict the throughput of data structures based on coarse-grained locking.

Soutenances de thèses
Mardi 25 septembre 2018, 14 heures, Salle 3052, Bâtiment Sophie Germain
Yann Hamdaoui (IRIF) Concurrency, References and Linear Logic

The topic of this thesis is the study of the encoding of references and concurrency in Linear Logic. Our perspective is to demonstrate the capability of Linear Logic to encode side-effects to make it a viable, formalized and well studied compilation target for functional languages in the future. The key notion we develop is that of routing areas: a family of proof nets which correspond to a fragment of differential linear logic and which implements communication primitives. We develop routing areas as a parametrizable device and study their theory. We then illustrate their expressivity by translating a concurrent λ-calculus featuring concurrency, references and replication to a fragment of differential nets. To this purpose, we introduce a language akin to Amadio’s concurrent λ-calculus, but with explicit substitutions for both variables and references. We endow this language with a type and effect system and we prove termination of well-typed terms by a mix of reducibility and a new interactive technique. This intermediate language allows us to prove a simulation and an adequacy theorem for the translation.

Soutenances de thèses
Jeudi 20 septembre 2018, 10 heures, 1828 (Olympe de Gouges)
Matthieu Boutier Routage sensible à la source

En routage next-hop, paradigme de routage utilisé dans l'Internet Global, chaque routeur choisit le next-hop de chaque paquet en fonction de son adresse destination. Le routage sensible à la source est une extension compatible du routage next-hop où le choix du next-hop dépend de l'adresse source du paquet en plus de son adresse destination. Nous montrons dans cette thèse que le routage sensible à la source est adapté au routage des réseaux multihomés avec plusieurs adresses, qu'il est possible d'étendre de manière compatible les protocoles de routage à vecteur de distance existants et que ce paradigme de routage offre avantageusement plus de flexibilité aux hôtes. Nous montrons d'abord que certains systèmes n'ordonnent pas correctement les entrées sensibles à la source dans leurs tables de routage et nous définissons un algorithme adapté aux protocoles de routage pour y remédier. Nous montrons comment étendre les protocoles à vecteur de distances au routage sensible à la source de manière compatible. Nous validons notre approche en concevant une extension d'un protocole existant (Babel), en réalisant la première implémentation complète d'un protocole sensible à la source et en utilisant ce protocole pour router un réseau multihomé. Enfin, nous montrons que le routage sensible à la source offre des possibilités de multichemin aux couches supérieures des hôtes. Nous vérifions qu'il s'intègre aux technologies existantes (MPTCP) et nous concevons des techniques d'optimisation pour les applications légères. Nous évaluons ces techniques après les avoir implémentées dans le cadre d'une application existante (mosh).

Soutenances de thèses
Mercredi 19 septembre 2018, 14 heures, Salle 3052, Bâtiment Sophie Germain
Laurent Feuilloley (IRIF) Certification locale en calcul distribué : sensibilité aux erreurs, uniformité, redondance et interactivité

Cette thèse porte sur la notion de certification locale, un sujet central en décision distribuée, un domaine du calcul distribué. Le mécanisme de la décision distribuée consiste, pour les nœuds d'un réseau, à décider de manière distribuée si le réseau est dans une configuration correcte ou non, selon un certain prédicat. Cette décision est dite locale, car les nœuds du réseau ne peuvent communiquer qu'avec leurs voisins. Après avoir communiqué, chaque nœud prend une décision, exprimant si le réseau est correct ou non localement, c'est-à-dire correct étant donné l'information partielle récoltée jusque-là. Le réseau est déclaré correct globalement s'il est déclaré correct localement par tous les nœuds.

Du fait de la contrainte de localité, peu de prédicats peuvent être vérifiés de cette manière. La certification locale est un moyen de contourner cette difficulté, et permet de décider tous les prédicats. C'est un mécanisme qui consiste à étiqueter les nœuds du réseau avec ce que l'on appelle des certificats, qui peuvent être vérifiés localement par un algorithme distribué. Un schéma de certification locale est correct si seuls les réseaux dans une configuration correcte peuvent être certifiés. L'idée de la certification locale est non seulement séduisante d'un point de vue théorique, comme une forme de non-déterminisme distribué, mais c'est surtout un concept très utile pour l'étude des algorithmes tolérants aux pannes, où une étape-clé consiste à vérifier l'état du réseau en se basant sur des informations stockées par les nœuds.

Cette thèse porte sur quatre aspects de la certification locale : la sensibilité aux erreurs, l'uniformité, la redondance et l'interactivité. L'étude de ces quatre sujets est motivée par une question essentielle : comment réduire les ressources nécessaires à la certification et/ou permettre une meilleure tolérance aux pannes? Pour aborder cette question, il est nécessaire de comprendre le mécanisme de certification en profondeur. Dans cette optique, dans cette thèse, nous apportons des réponses aux questions suivantes. À quel point les certificats doivent-ils être redondants, pour assurer une certification correcte? Les schémas de certification classiques sont-ils robustes à un changement de la condition de correction? Le fait d'introduire de l'interactivité dans le processus change-t-il la complexité de la certification?

Mots-clefs: Calcul distribué sur réseau, décision distribuée, certification locale, schéma d'étiquetage de preuve, tolérance aux pannes.

Soutenances de thèses
Mardi 18 septembre 2018, 14 heures, 580F (Halle aux Farines)
Guillaume Claret (IRIF) Program in Coq

In this thesis, we develop new techniques to conveniently write formally verified programs. To proceed, we study the use of Coq as a programming language in different settings. Coq being a purely functional language, we mainly focus on the representation and on the specification of impure effects, like exceptions, mutable references, inputs-outputs, and concurrency.

First, we work on two preliminary projects helping us to understand the challenges of programming in Coq. The first project, Cybele, is a Coq plugin to write efficient proofs by reflection with effects. We compile and execute the impure effects in OCaml to generate a prophecy, a kind of certificate, and then interpret the effects in Coq using the prophecy. The second project, the compiler CoqOfOCaml, imports OCaml programs with effects into Coq, using an effect inference system.

Next, we describe different generic and composable representations of impure effects in Coq. The breakable computations combine the standard exceptions and mutable references effects, with a pause mechanism to make explicit the evaluation steps in order to represent the concurrent evaluation of two terms. By implementing the Pluto web server in Coq, we realize that the most important effects to program are the asynchronous inputs-outputs. Indeed, these effects are ubiquitous and cannot be encoded in a purely functional manner. Thus, we design the asynchronous computations as a first way to represent and compile programs with events and handlers in Coq.

Then, we study techniques to prove properties about programs with effects. We start with the verification of the blog system ChickBlog written in the language of the interactive computations. This blog runs one worker with synchronous inputs-outputs per client. We verify our blog using the method of specification by use cases. We adapt this technique to type theory by expressing a use case as a well-typed co-program over the program we verify. Thanks to this formalism, we can present a use case as a symbolic test program and symbolically debug it, step by step, using the interactive proof mode of Coq. To our knowledge, this is the first such adaptation of the use case specifications in type theory. We believe that the formal specification by use cases is one of the keys to verify effectful programs, as the method of use cases proved to be convenient to express (informal) specifications in the software industry. We extend our formalism to concurrent and potentially non-terminating programs with the language of concurrent computations. Apart from the use case method, we design a model-checker to verify the deadlock freeness of concurrent computations, by compiling the parallel composition to the non-deterministic choice operator.

Soutenances de thèses
Lundi 10 septembre 2018, 14 heures, Amphi Turing, Bâtiment Sophie Germain
Luca Reggio (IRIF) Quantifiers and duality

The unifying theme of the thesis is the semantic meaning of logical quantifiers. In their basic form quantifiers allow to state the existence, or non-existence, of individuals satisfying a property. As such, they encode the richness and the complexity of predicate logic, as opposed to propositional logic.

We contribute to the semantic understanding of quantifiers, from the viewpoint of duality theory, in three different areas of mathematics and theoretical computer science. First, in formal language theory through the syntactic approach provided by logic on words. Second, in intuitionistic propositional logic and in the study of uniform interpolation. Third, in categorical topology and categorical semantics for predicate logic.

Soutenances de thèses
Lundi 10 septembre 2018, 14 heures, Bâtiment Sophie Germain
Bin Fang (IRIF) Techniques for formal modelling and verification on dynamic memory allocators

The first part of the thesis demonstrates how to obtain formal specifications of free-list SDMA using a refinement-based approach. The thesis defines a hierarchy of models ranked by the refinement relation that capture a large variety of techniques and policies employed by real-work SDMA. This hierarchy forms an algorithm theory for the free-list SDMA and could be extended with other policies. The formal specifications are written in Event-B and the refinements have been proved using the Rodin platform. The thesis investigates applications of the formal specifications obtained, such as model-based testing, code generation and verification.The second part of the thesis defines a technique for inferring precise invariants of existing implementations of SDMA based abstract interpretation. For this, the thesis defines an abstract domain representing sets of states of the SDMA. The abstract domain is based on a fragment of Separation Logic, called SLMA. This fragment captures properties related with the shape and the content of data structures used by the SDMA to manage the heap. The abstract domain is defined as a specific product of an abstract domain for heap shapes with an abstract domain for finite arrays of locations. To obtain compact elements of this abstract domain, the thesis proposes an hierarchical organisation of the abstract values: a first level abstracts the list of all chunks while a second level selects only the chunks available for allocation. The thesis defines transformers of the abstract values that soundly capture the semantics of statements used in SDMA implementations. A prototype implementation of this abstract domain has been used to analyse simple implementations of SDMA.

Soutenances de thèses
Jeudi 5 juillet 2018, 14 heures 30, 580F (halle aux farines)
Guillaume Lagarde (IRIF) Contributions to Arithmetic Complexity and Compression

This thesis explores two territories of computer science: complexity and compression. More precisely, in a first part, we investigate the power of non-commutative arithmetic circuits, which compute multivariate non-commutative polynomials. For that, we in- troduce various models of computation that are restricted in the way they are allowed to compute monomials. These models generalize previous ones that have been widely studied, such as algebraic branching programs. The results are of three different types. First, we give strong lower bounds on the number of arithmetic operations needed to compute some polynomials such as the determinant or the permanent. Second, we design some deterministic polynomial-time algorithm to solve the white-box polynomial identity testing problem. Third, we exhibit a link between automata theory and non-commutative arithmetic circuits that allows us to derive some old and new tight lower bounds for some classes of non-commutative circuits, using a measure based on the rank of a so-called Hankel matrix. A second part is concerned with the analysis of the data compression algorithm called Lempel-Ziv. Although this algorithm is widely used in practice, we know little about its stability. Our main result is to show that an infinite word compressible by LZ’78 can become incompressible by adding a single bit in front of it, thus closing a question proposed by Jack Lutz in the late 90s under the name “one-bit catastrophe”. We also give tight bounds on the maximal possible variation between the compression ratio of a finite word and its perturbation—when one bit is added in front of it.

Soutenances de thèses
Jeudi 5 juillet 2018, 14 heures, Room B107 of LIPN, Université Paris 13
Huu Vu Nguyen (IRIF) On CARET Model-Checking of Pushdown Systems: Application to Malware Detection

The number of malware is growing significantly fast. Traditional malware detectors based on signature matching or code emulation are easy to get around. To overcome this problem, model-checking emerges as a technique that has been extensively applied for malware detection recently. Pushdown systems were proposed as a natural model for programs, since they allow to keep track of the stack, while extensions of LTL and CTL were considered for malicious behavior specification. However, LTL and CTL like formulas don't allow to express behaviors with matching calls and returns. In this thesis, we propose to use CARET (a temporal logic of calls and returns) for malicious behavior specification. CARET model checking for Pushdown Systems (PDSs) was never considered in the literature. Previous works only dealt with the model checking problem for Recursive State Machine (RSMs). While RSMs are a good formalism to model sequential programs written in structured programming languages like C or Java, they become non suitable for modeling binary or assembly programs, since, in these programs, explicit push and pop of the stack can occur. Thus, it is very important to have a CARET model checking algorithm for PDSs. We tackle this problem in this thesis. We reduce it to the emptiness problem of Büchi Pushdown Systems. Since CARET formulas for malicious behaviors are huge, we propose to extend CARET with variables, quantifiers and predicates over the stack. This allows to write compact formulas for malicious behaviors. Our new logic is called Stack linear temporal Predicate logic of CAlls and RETurns (SPCARET). We reduce the malware detection problem to the model checking problem of PDSs against SPCARET formulas, and we propose efficient algorithms to model check SPCARET formulas for PDSs. We implemented our algorithms in a tool for malware detection. We obtained encouraging results. We then define the Branching temporal logic of CAlls and RETurns (BCARET) that allows to write branching temporal formulas while taking into account the matching between calls and returns and we proposed model-checking algorithms of PDSs for BCARET formulas. Finally, we consider Dynamic Pushdown Networks (DPNs) as a natural model for multithreaded programs with (recursive) procedure calls and thread creation. We show that the model-checking problem of DPNs against CARET formulas is decidable.

Soutenances de thèses
Jeudi 5 juillet 2018, 10 heures, Salle B107 du LIPN, Université Paris 13
Adrien Pommellet (IRIF) On Model-checking Pushdown System Models

In this thesis, we propose different model-checking techniques for pushdown system models. Pushdown systems (PDSs) are indeed known to be a natural model for sequential programs, as they feature an unbounded stack that can simulate the assembly stack of an actual program. Our first contribution consists in model-checking the logic HyperLTL that adds existential and universal quantifiers on path variables to LTL against pushdown systems (PDSs). The model-checking problem of HyperLTL has been shown to be decidable for finite state systems. We prove that this result does not hold for pushdown systems nor for the subclass of visibly pushdown systems. Therefore, we introduce approximation algorithms for the model-checking problem, and show how these can be used to check security policies. In the second part of this thesis, as pushdown systems can fail to accurately represent the way an assembly stack actually operates, we introduce pushdown systems with an upper stack (UPDSs), a model where symbols popped from the stack are not destroyed but instead remain just above its top, and may be overwritten by later push rules. We prove that the sets of successors post* and predecessors pre* of a regular set of configurations of such a system are not always regular, but that post* is context-sensitive, hence, we can decide whether a single configuration is forward reachable or not. We then present methods to overapproximate post* and under-approximate pre*. Finally, we show how these approximations can be used to detect stack overflows and stack pointer manipulations with malicious intent. Finally, in order to analyse multi-threaded programs, we introduce in this thesis a model called synchronized dynamic pushdown networks (SDPNs) that can be seen as a network of pushdown processes executing synchronized transitions, spawning new pushdown processes, and performing internal pushdown actions. The reachability problem for this model is obviously undecidable. Therefore, we compute an abstraction of the execution paths between two regular sets of configurations. We then apply this abstraction framework to a iterative abstraction refinement scheme.

Soutenances de thèses
Mardi 3 juillet 2018, 14 heures, Room B107 of LIPN, Université Paris 13
Khanh Huu The Dam (IRIF) Automatic Learning and Extraction of Malicious Behaviors

Malware detection is nowadays a big challenge. The existing techniques for malware detection require a huge effort of engineering to manually extract the malicious behaviors. To avoid this tedious task of manually discovering malicious behaviors, we propose in this thesis two approaches: (1) Apply Information Retrieval techniques to automatically discover malicious behaviors, and (2) Apply machine learning to automatically learn malwares. We use API call graphs to represent programs. API call graphs are graphs whose nodes are API functions and whose edges represent the order of execution of the different calls to the API functions (i.e, functions supported by the operating system). To automatically learn malwares, we apply well-known learning techniques based on Random Walk Graph Kernels (combined with Support Vector Machines). We achieve a high detection rate with only few false alarms (97% for detection rate with 0.73% of false alarms). As for the automatic extraction of malicious behaviors, we reduce this problem to the problem of retrieving from the benign and malicious API call graphs the set of subgraphs that are relevant for malicious behaviors. We solve this issue by applying and adapting well-known efficient Information Retrieval techniques based on the TFIDF scheme. We use our automatically extracted malicious behavior specification for malware detection using a kind of product between graphs. We obtained interesting experimental results, as we get 96% of detection rate. Using our two approaches, we were able to detect several malwares that well-known and widely used antiviruses such as Panda, Avira, Kaspersky, Avast, Qihoo360, McAfee, AVG, BitDefender, ESET-NOD32, F-Secure and Symantec could not detect.

Soutenances de thèses
Mardi 3 juillet 2018, 14 heures, Salle 580F (Halle aux Farines)
Thibaut Girka (IRIF) Differential program semantics

Computer programs are rarely written in one fell swoop. Instead, they are written in a series of incremental changes.It is also frequent for software to get updated after its initial release. Such changes can occur for various reasons, such as adding features, fixing bugs, or improving performances for instance. It is therefore important to be able to represent and reason about those changes, making sure that they indeed implement the intended modifications.In practice, program differences are very commonly represented as textual differences between a pair of source files, listing text lines that have been deleted, inserted or modified. This representation, while exact, does not address the semantic implications of those textual changes. Therefore, there is a need for better representations of the semantics of program differences.Our first contribution is an algorithm for the construction of a correlating program, that is, a program interleaving the instructions of two input programs in such a way that it simulates theirsemantics. Further static analysis can be performed on such correlating programs to compute an over-approximation of the semantic differences between the two input programs. This work draws direct inspiration from an article by Partush and Yahav, that describes a correlating program construction algorithm which we show to be unsound on loops that include `break` or `continue`statements. To guarantee its soundness, our alternative algorithm is formalized and mechanically checked within the Coq proof assistant.Our second and most important contribution is a formal framework allowing to precisely describe and formally verify semantic changes.This framework, fully formalized in Coq, represents the difference between two programs by a third program called an oracle.Unlike a correlating program, such an oracle is not required to interleave instructions of the programs under comparison, and may “skip” intermediate computation steps. In fact, such an oracle is typically written in a different programming language than the programs it relates, which allows designing correlating oracle languages specific to certain classes of program differences, andcapable of relating crashing programs with non-crashing ones.We design such oracle languages to cover a wide range of program differences on a toy imperative language. We also prove that our framework is at least as expressive as Relational Hoare Logic by encoding several variants as correlating oracle languages, proving their soundness in the process.

Soutenances de thèses
Vendredi 15 juin 2018, 14 heures, Bâtiment Sophie Germain
Clément Dervieux (IRIF) Enumeration of oriented planar maps

After a general presentation of planar maps, we define corner polyhedra, studied by Eppstein and Mumford. We soon introduce corner triangulations, that are dual maps of the skeletons of corner polyhedra, and we give some properties of them.We offer a linear time algorithm to realize corner polyhedra. For that, the study of corner triangulations leads to enumeration problems. A classic method, known from Tutte, gives the wanted result, making the series of Catalan numbers appearing. The research for a combinatorial explanation of the presence of Catalan numbers induces the use of other methods, based on cuttings and gluings of some parts of corner triangulations. Thus appears the family of almond triangulations, that is a new representation of Catalan numbers, in bijection with the binary trees family, and that completes our corner polyhedra realization algorithm. We eventually give a conclusion to these works, trying to generalize our methods to maps whose faces have an any fixed degree.

Soutenances de thèses
Vendredi 25 mai 2018, 15 heures, Salle 0010, Bâtiment Sophie Germain
Florent Urrutia (IRIF) Information Theory for Multi-Party Peer-to-Peer Communication Protocols

This thesis is concerned with the study of multi-party communicationprotocols in the asynchronous message-passing peer-to-peer model. We introducetwo new information measures, the Public Information Complexity(PIC) and the Multi-party Information Complexity (MIC), study their propertiesand how they are related to other fundamental quantities in distributedcomputing such as communication complexity and randomness complexity.We then use these two measures to study the parity function and the disjointness function.

Soutenances de thèses
Vendredi 27 avril 2018, 14 heures, Salle 1021, Bâtiment Sophie Germain
Alex B. Grilo (IRIF) Quantum proofs, the Local Hamiltonian problem and applications

Manuscript is available here: https://www.irif.fr/~abgrilo/thesis.pdf