Context-free grammar

In the theory of formal languages, a context-free grammar is a formal grammar that contains only those rewrite rules in which exactly one non-terminal symbol is derived on an arbitrarily long sequence of non-terminal and terminal symbols. Thus, the replacement rules have the form (with non- terminal symbol and string consisting of non-terminal and terminal symbols ).

Because the left-hand side of a rule consists of a single nonterminal symbol, their applicability depends solely on a string, if the non-terminal symbol in the string appears, but not of the context in which it is located. The rules are thus context-free.

The context-free grammars are identical to the type-2 grammars of the Chomsky hierarchy.

  • 4.1 word problem
  • 4.2 ambiguity
  • 4.3 equivalence
  • 4.4 subset
  • 4.5 Association
  • 4.6 average
  • 4.7 complement
  • 5.1 Language of the Palindrome
  • 5.2 Ambiguous Example

Definition

A context-free grammar is a 4 -tuple with the following properties:

  • And are disjoint alphabets
  • Is a finite subset of

Herein, the Kleene hull.

Explanation

This is called set of nonterminals, quantity of terminals, quantity of productions or rules and start symbol. A rule is usually recorded in the form.

According to the definition applies to a rule that is so that exactly one non-terminal is on the left side of the rewrite rule. It is a rule not surrounded on the left by other symbols, and therefore it stand for any string that contains that non-terminal, always the same rules to choose from, no matter what symbols the nonterminal surrounded in a string. In short, the selection of rules is independent of the context of the.

From G generated speech

The context-free grammars generate exactly the context-free languages ​​, that is, each type 2 grammar generates a context-free language and to each context-free language, there is a Type -2 grammar that generates this.

The context-free language, which is generated by the context-free grammar is defined as

.

The relation (derivation ) represents a sequence of rule applications as far as grammar. It must start from the icon as long as non-terminal with the help of rules to be replaced until only terminals remain. Apparently applies.

The context-free languages ​​are exactly the languages ​​that are accepted by a nondeterministic pushdown automata. There also exists a deterministic pushdown automaton is called the language also deterministic context -free. This proper subset of context-free languages, is the theoretical basis for the syntax of most programming languages.

Context-free languages ​​can contain the empty word, eg by a production rule. Some theorems on context-free grammars, however, additionally require that the empty word of it may not be created. There are, for example, only the context-free grammars, a grammar equivalent in Greibach normal form if the empty word by them can not be generated, since in each derivation step exactly one terminal is generated.

Normal Forms

For context-free grammars various normal forms are defined. Under the Chomsky normal form ( CNF), the right sides of the non-terminal productions are limited, that is, on the right side may be either a single terminal symbol or exactly two non-terminal symbols are. If the start symbol is on the left side, the right side of the production, however, may also be the empty word. By an algorithm, each context-free grammar can be converted into the CNF.

A context-free grammar is in Greibach normal form (GNF ) if they do not generate the empty word and the right sides of productions with maximum start a terminal symbol and otherwise contain only non-terminal symbols. Each context-free grammar that does not generate the empty word can be converted to an algorithm in the GNF.

Properties

Word problem

The word problem for context-free languages ​​, ie the problem of whether a word from a context-free grammar can be generated is decidable. Towards the solution of the word problem, a derivation tree can also be generated. This derivation tree is also called the parse tree, and a program that generates a parse tree, a parser is. For each context-free grammar, a parser can be automatically be generated (see also the CYK algorithm). The worst-case runtime complexity of a parser for any context-free grammar is. For subclasses of context-free grammars, parsers can be generated whose maturity is. A typical application of an efficient context-free parser with linear run- time parsing a programming language source code by a compiler.

If a word of the language L () can be generated by the grammar in several different ways, then the grammar is ambiguous. A parser can generate in an ambiguous grammar for a given word, not only one but several derivation trees. Ambiguity is not a problem when only the word problem is to be solved. However, if the different derivation trees a different meaning assigned, then a word with an ambiguous grammar have several different meanings. An example of the need for a clear context-free grammar is a compiler must generate deterministic and unambiguous target executable code for each valid input.

Ambiguity

The problem whether a (any ) context-free grammar is ambiguous or non- ambiguous, is not decidable. However, there are test methods that can detect ambiguity or non - ambiguity of certain sub- classes of context-free grammars. Depending on the test method does not terminate the ambiguity- test or the test returns that the ambiguity can not be determined, if the context-free grammar input element is not part of a particular class of context-free grammars.

Equivalence

The problem of whether two context-free grammars and generate the same language (ie, if) is not decidable.

Subset

The problem whether the language generated by a context-free grammar is also generated by a context-free grammar (ie, if) is not decidable.

Union

The union of the two languages ​​of context-free grammars and can also be generated by a context-free grammar, viz. ( It is assumed that apply and what can be achieved for all but. )

Section

The problem whether the section of the languages ​​of context-free grammars two is also generated by a context-free grammar is not decidable.

Complement

The complement of a context-free grammar is not context-free in general.

Examples

Be a context-free grammar with

Contains four productions or production rules:

Can be generated by the grammar with the following derivation:

Is the derivation tree in term notation. The root and the internal nodes are labeled with nonterminal symbols and the leaves with terminal symbols.

So is.

The example word is not part of the language, since the nonterminal is the start symbol, and every word of the language must be included and the terminal symbols from the Start icon. In formula notation:

Grammar is not ambiguous.

Language of palindromes

The grammar given as generates the language of all palindromes over the alphabet.

Ambiguous example

An example of an ambiguous grammar.

Contains the following productions:

For there are, among other things, discharges, and. So is ambiguous.

Extension

An extension of context-free grammars form stochastic context-free grammars ( SCFG ), also known as probabilistic context-free grammars ( PCFG ). Here, each production rule is assigned a probability of occurrence: so that, for each straight.

This probability of occurrence of the individual rules induce a probability distribution on the set of words generated by the grammar.

A stochastic context-free grammar can be used for example to compute the most probable parse in a syntactically ambiguous grammar for an input word. Another use case is the stochastic sampling of derivation trees under the given rule probabilities of an ambiguous grammar. The language generated by a SCFG is as defined as the language of a CFG. SCFGs are, for example, used in bioinformatics and computational linguistics.

485429
de