Formal grammar

Formal grammars are mathematical models of grammars that are specified by means of the semi-Thue system and can be written and produced by the formal languages. They are used in theoretical computer science, especially in computability theory, and the compiler to firstly describe a formal language unique ( that is, to determine unambiguously whether a word is an element of a language or not) and to the other to investigate properties of these formal languages ​​or to prove. Formal grammars are classified in the Chomsky hierarchy.


With a formal grammar can be prepared starting from a start symbol (also called the start variable ) use production rules from a set of rules that generate from the start symbol of new strings, which in turn can be replaced. This process is called derivation.

The vocabulary of a grammar consisting of terminal symbols and nonterminal symbols are before it can be used for the symbols. The set of terminal symbols defined, from which characters words that can not be derived. These words, taken together, the result described by the formal language grammar. The start symbol must be a nonterminal symbol against it. Additional non-terminal symbols allow more sophisticated rules.

Production rules are defined as tuples that are also written. It applies it to a string by an occurrence of the string is replaced by. On the left side of the rule is always at least one nonterminal symbol must stand. A set of rules with the same left side, ie, is abbreviated as written.

For example, you can see the string either to or derived by the rule set.

Just as on a given string multiple rules may be applicable at the same time, it must not be only one place in the string, the fits a rule. Formal grammars do not impose order. All consisting only of terminal symbols words that can be derived from the start symbol, belong to the described by the grammar language.


A formal grammar is a 4 -tuple consisting of:

  • , A finite amount which is referred to as a vocabulary,
  • , A subset of the alphabet is called and its elements are called terminal symbols,
  • , A finite set of production rules, and
  • , The start symbol.

Herein, the Kleene hull of.

The set is the set of nonterminal symbols or metasymbols.

Be expressed as the tuple, is also common.

The language described by a grammar

A rule of a given grammar says that, so that a new word is created in a word with R as infix, this can be replaced by the word with Q as infix. One speaks here also assume that in the grammar or by the rule passes, or even that has been derived from. This can be noted by (often instead ). If you only want to express that in the grammar of the word may arise from without specifying a rule to write instead (the grammar from the context obvious, even simple). Thus, such a transition from a transition relation, which is a natural extension of all possible contexts, namely:

Is there now a string of words, applies in that for each natural number with the word merges ( ), then in increments of derivable, which is represented by. Such a phrase is called a derivation of invoice or in length. Further means to be derived when there is at least one, so that steps can be derived from. When is found, this is represented by the notation. It is also defined that applies to every word that is so that the relation is the reflexive - transitive closure of the relation.

Now the formal language generated by the grammar is the one language that consists of all the words that are on the one only of terminal symbols and which can be derived from the start symbol to the other with a finite number of steps:

It does not matter the order in which production rules are applied to the derived words, or whether there are several ways to derive a word out. Two grammars are equivalent if and only if the languages ​​generated by and are the same:


Is a grammar with terminal symbols, non terminal symbols, the start symbol and with the rules

It is the empty word, which is a word of length 0. This grammar is defined with the language of all words of the form. Thus, on the basis of the following derivations, the words, and the elements described by Language:

  • , By rule (2),
  • , By means of rules (1), (4) and (6),
  • , By means of the rules ( 1), ( 1), ( 4), ( 3), ( 5), ( 6) and ( 7).

But there are other ways to derive the word out.

Another grammar that describes the same language is context-free grammar with the rules:

Every recursively enumerable language is generated from multiple (and countably infinitely many ) grammars. However, there are languages ​​that can be generated by any grammar.

Classes of grammars

Grammars are assigned to classes, which are characterized by similarities. The best-known classification described Noam Chomsky and Marcel Schützenberger with the Chomsky hierarchy.

Chomsky hierarchy

The Chomsky hierarchy divides the grammars according to the type of production rules in classes of type 0 to type 3 a:

  • Type 0 grammars: phrase structure grammars are unrestricted formal grammars.
  • Type 1 grammars: Context-sensitive grammars may only consist of rules in which exactly one non-terminal symbol is replaced by a string. This symbol shall be surrounded on the left side of the rule and of other symbols that specify a context within which the replacement must take place.
  • Type 2 grammars: on the other hand may only be exactly one non-terminal symbol on the left sides of rules in context-free grammars. The symbol can not be replaced depending on the context.
  • Type 3 grammars: For regular grammars, the left sides of the rules also contain exactly one non-terminal symbol. In the left regular grammars the right side of the rule must consist of at most one non-terminal symbol, the most one terminal symbol follows. In right regular grammars, however, may the right side consist of at most one terminal symbol, the most one nonterminal follows.

The corresponding classes of languages ​​are decreasingly extensive. There is following proper inclusion relation:

For the language with classes of type: .

Other language classes

Apart from the Chomsky hierarchy to more classes have been established to grammars:

  • Monotone grammars describe the same language class as the context-sensitive grammars. Something stringent the growing context-sensitive grammars that describe a subclass of context-sensitive language class.
  • Deterministic context-free grammars describe the deterministic context -free languages ​​. They are described by the LR ( k) grammars, which are for the compiler of importance. Other known in compiler construction grammars are LL (k ) grammars and LF ( k) grammars.