Syntactic semigroups.

Syntactic semigroups

Chapter 10 in Handbook of language theory, G. Rozenberg et A. Salomaa (éd.), 1997

Jean-Éric Pin

PostScript file compressed with gzip, PDF file

Résumé. Ce chapitre présente une synthèse de ce qu'on appelle souvent la théorie algébrique des automates. Cette théorie parle de langages, d'automates et de semigroupes et a des connections avec la logique, les circuits booléens, la dynamique symbolique et la topologie.

Abstract. This chapter gives an overview on what is often called the algebraic theory of finite automata. It deals with languages, automata and semigroups, and has connections with model theory in logic, boolean circuits, symbolic dynamics and topology.

Introduction : This chapter gives an overview on what is often called the algebraic theory of finite automata. It deals with languages, automata and semigroups, and has connections with model theory in logic, boolean circuits, symbolic dynamics and topology.

Kleene's theorem is usually considered as the foundation of this theory. It shows that the class of recognizable languages (i.e. recognized by finite automata), coincides with the class of rational languages, which are given by rational expressions. The definition of the syntactic monoid, a monoid canonically attached to each language, was first given in an early paper of Rabin and Scott, where the notion is credited to Myhill. It was shown in particular that a language is recognizable if and only if its syntactic monoid is finite. However, the first classification results were rather related to automata. A break-through was realized by Schützenberger in the mid sixties. Schützenberger made a non trivial use of the syntactic monoid to characterize an important subclass of the rational languages, the star-free languages. Schützenberger's theorem states that a language is star-free if and only if its syntactic monoid is finite and aperiodic.

This theorem had a considerable influence on the theory. Two other important "syntactic" characterizations were obtained in the early seventies: Simon proved that a language is piecewise testable if and only if its syntactic monoid is finite and J-trivial and Brzozowski-Simon and independently, McNaughton characterized the locally testable languages. These successes settled the power of the semigroup approach, but it was Eilenberg who discovered the appropriate framework to formulate this type of results.

A variety of finite monoids is a class of monoids closed under taking submonoids, quotients and finite direct product. Eilenberg's theorem states that varieties of finite monoids are in one to one correspondence with certain classes of recognizable languages, the varieties of languages. For instance, the rational languages correspond to the variety of all finite monoids, the star-free languages correspond to the variety of finite aperiodic monoids, and the piecewise testable languages correspond to the variety of finite J-trivial monoids. Numerous similar results have been established during the past fifteen years and, for this reason, the theory of finite automata is now intimately related to the theory of finite monoids.

It is time to mention a sensitive feature of this theory, the role of the empty word. Indeed, there are two possible definitions for a language. A first definition consists in defining a language on the alphabet A as a subset of the free monoid A^*. In this case a language may contain the empty word. In the second definition, a language is defined as a subset of the free semigroup, which excludes the empty word. This subtle distinction has dramatic consequences on the full theory. First, one has to distinguish between *-varieties (the first case) and +-varieties of languages (the latter case). Next, with the latter definition, monoids have to be replaced by semigroups and Eilenberg's theorem gives a one to one correspondence between varieties of finite semigroups and +-varieties of languages. Although it might seem quite annoying to have two such parallel theories, this distinction proved to be necessary. For instance, the locally testable languages form a +-variety, which correspond to locally idempotent and commutative semigroups. But no characterization of the locally testable languages is known in terms of syntactic monoids.

An extension of the the notion of syntactic semigroup (or monoid) was recently proposed. The key idea is to define a partial order on syntactic semigroups, leading to the notion of syntactic ordered semigroup. The resulting extension of Eilenberg's variety theory permits to treat classes of languages that are not necessarily closed under complement, a major difference with the original theory. We have adopted this new point of view throughout this chapter. For this reason, even our definition of recognizable languages may seem unfamiliar to the reader.

The theory has now developed into many directions and has generated a rapidly growing literature.The aim of this chapter is to provide the reader with an overview of the main results. As these results are nowadays intimately related with non commutative algebra, a certain amount of semigroup theory had to be introduced, but we tried to favor the main ideas rather than the technical developments. Some important topics had to be skipped and are briefly mentioned in the last section. Due to the lack of place, no proofs are given, but numerous examples should help the reader to catch the spirit of the more abstract statements. The references listed at the end of the chapter are far from being exhaustive. However, most of the references should be reached by the standard recursive process of tracing the bibliography of the papers cited in the references.

The chapter is organized as follows. The amount of semigroup theory that is necessary to state precisely the results of this chapter is introduced in Section 2. The basic concepts of recognizable set and syntactic ordered semigroups are introduced in Section 3. The variety theorem is stated in Section 4 and examples follow in Section 5. Some algebraic tools are presented in Section 6. Sections 7 and 8 are devoted to the study of the concatenation product and its variants. Connections with the theory of codes are discussed in Section 9. Section 10 gives an overview on the operators on recognizable languages. Various extensions are briefly reviewed in Section 11.

PostScript file compressed with gzip, PDF file