Concatenation hierarchies

One of the most fascinating problem in automata theory is the so-called "dot-depth 2" problem. It admits several equivalent formulations
1. Combinatorial version. Is there an algorithm to decide whether a given rational language is a Boolean combination of languages of the form A*0a1A*1 ... anA*n, where the ai's are letters and the Ai's are subsets of the alphabet ?

2. Algebraic version. Is there an algorithm to decide whether a given finite monoid divides, for some n, a monoid of n by n upper-triangular Boolean matrices ?

3. Logical version. Is there an algorithm to decide whether a first order formula of Büchi's sequential calculus is equivalent (on finite words) with a Boolean combination of Σ2-formulas ?

4. History and known results

5. References

1. Combinatorial version.

Let A be an alphabet. The definition of star-free subsets of A* makes use of two different types of operations: Boolean operations and concatenation product. By alternating the use of these two operations, one gets a hierarchy, called the concatenation hierarchy, defined as follows.
1. The sets of level 0 are the empty set and A*,
2. For every integer n > 0, the sets of level n+1/2 are the finite unions of sets of the form
L0a1L1a2 ... akLk
where L0, L1, ..., Lk are sets of level n and a1, ..., ak are letters.
3. For every integer n > 0, the sets of level n+1 are the finite Boolean combinations of sets of level n+1/2.
Note that a set of level m is also a set of level n for every n > m. The following results are known (Brzozowski and Knast, Perrin and Pin).
1. For every n> 0, the sets of level n are closed under union, intersection, and complement.
2. For every n> 0, the sets of level n+1/2 are closed under union, intersection, and concatenation product.
3. The hierarchy is strict: if A contains at least two letters, then for every n, there exist some sets of level n+1 that are not of level n + 1/2 and some sets of level n + 1/2 that are not of level n.

2. Algebraic version.

A monoid M divides a monoid N if M is a quotient of a submonoid of N.
The Boolean semiring is B = {0, 1}. Addition and multiplication are given in the following tables
 + 0 1 0 0 1 1 1 1
 x 0 1 0 0 0 1 0 1
Matrix multiplication is defined in the usual way. A Boolean upper triangular matrix is a matrix whith 0 entries in position (i, j) if i > j. For instance
 1 0 1 1 0 0 0 1 0 0 1 1 0 0 0 1

is a 4 x 4 upper triangular Boolean matrix. The set Tn of n x n upper triangular matrices is a monoid under the multiplication of Boolean matrices. The problem is to decide whether a given finite monoid divides one of the monoids Tn, for some n > 0.
The corresponding problem for the monoids Un of unitriangular matrices has been solved: a monoid divides a monoid Un for some n > 0, if and only if it is J-trivial.

3. Logical version.

Büchi's sequential calculus is a logical formalism to specify some combinatorial properties of a finite word, for instance "the factor bba occurs three times in the word, but the factor bbb does not occur". Thus, each logical sentence of this calculus defines a language, namely the set of all words that satisfy the property expressed by the formula.

More precisely, to each word u ∈ A+ is associated a structure
Mu = ({1, 2, ..., ¦u¦}, <, (a)a ∈ A )
where < denotes the usual order on {1, 2, ..., ¦u¦} and a is the set of all i such that the i-th letter of u is an a. For instance, if A = {a, b} and u = abaab, then a = {1, 3, 4} and b = {2, 5}. The logical language appropriate to such models has < and the a's as non logical symbols, and formulas are built in the standard way by using these non-logical symbols, variables, Boolean connectives, equality between elements (positions) and quantifiers.
Given a sentence φ, we denote by L(φ) the set of all words which satisfy φ, when words are considered as models. For instance, the formula
∃ i ∃ j  i < j   ∧   a i   ∧   b j
defines the language A*aA*bA*.
Two formulas are said to be (elementary) equivalent if they define the same languages.
It is well known that every logical formula can be written in prenex formal form, that is to say, of the form φ = Q(x1, ...,xk)ψ where Q(x1, ...,xk) is a sequence of existential or universal quantifiers on the variables x1, ...,xk and where ψ is quantifier free. If Q(x1, ...,xk) is formed of n blocks of quantifiers such that the first block contains only existential quantifiers (note that this first block may be empty), the second block universal quantifiers, etc., we say that φ is a Σn-formula. We denote by Σn the set of the Σn-formulae and by BΣn the set of Boolean combinations (Boolean operations on formulae comprise conjunction, disjunction and negation) of Σn-formulae.

The connection between the two hierarchies was established by Thomas (see also Perrin and Pin): a subset of A* is of level n (resp. n+1/2) if and only if it is BΣn-definable (resp. Σn+1-definable).

4. History and known results Section under construction...