Chart-Parser

A chart parser, also written chart parser is a parser for context-free grammars, the partial analyzes ( Teilkonstituenten ) in a table (chart) remembers. This caching and reuse of partial analyzes improves the efficiency greatly and makes parsing of context-free languages ​​to a releasable in polynomial time problem.

Chartparsing is an umbrella term for all Parsverfahren who use such a table. After Parsalgorithmus used one distinguishes several subtypes:

  • Top-down chart parser ( Earley parser )
  • Left -corner chart parser
  • Island chart parser
  • 2.1 shell formation
  • 2.2 Integration of a terminal symbol
  • 2.3 Combining two Teilkonstituenten
  • 5.1 Repayment rules
  • 5.2 Perspectives ( Look Ahead )
  • 5.3 -separated grammar
  • 5.4 robustness

Chart

The chart is an n × n matrix, where n is the length of the sentence is to be analyzed. The spaces between the words of this sentence are numbered from 0 to n. In each chart cells are so-called dotted rules ( dotted rules, see LR parser ).

Formally, a chart as a set of 3 -tuples < i, j, A → α. β > describe, where:

  • I ( 0 ≤ i ≤ n) is the position from which a constituent of category A is expected.
  • J (> = i) position is detected α up to the.
  • A → α. β is a dot typically should be recognized by the β yet; Therefore, β is also called the open part of this rule, α is the closed part. α and β are made of an arbitrary sequence of terminal and non-terminal symbols of the context-free grammar. α and / or β can also be empty, that is, equal to ε, be.

Example

A single chart entry may look like this:

< 2, 5, VP → V NP. NP>

This means:

  • From the set position 2 a verb phrase (VP) is expected.
  • The analysis of the VP has progressed to the set position 5. The closed part V NP VP rule is located between positions 2 and 5.
  • Another NP is from the 5-position expected if the relevant VP- rule can be applied at all.

Parseroperationen

Use chart parser during the analysis normally three different operations:

Shell formation

If < i, j, A → α. B β > ∈ chart and

B → γ is a rule of the grammar,

Then add

< J, j, B →. γ >

In the chart on if this tuple is not yet available.

From the set position j has a constituent of category B is expected. For the expansion of B there exists a rule γ with right side. So you generate a new expectation, γ starting from the position to find j.

Integration of a terminal symbol

If < i, j, A → α. w β > ∈ chart (w is a terminal symbol or pre-terminal ) and

W is the j -th word of the analyzed sentence s = w0w1 ... wn,

Then add

< I, j 1, A → α w. β >

In the chart on if this tuple is not yet available.

The analysis is thus progressed so far that after the position j is a terminal symbol or a word category ( as a verb ) is expected. If the j -th word actually equal to w (or of the part of speech w), then this word can be integrated into the analysis. The point is then moved over the recognized word.

Combination of two Teilkonstituenten

Note: the combination operation described here is that of the top-down chart parser. Other methods of chart parsing, go here before something else.

If < i, j, A → α. B β > ∈ chart (B is a nonterminal symbol ) and

Is < j, k, B → γ. > ∈ Chart

Then add

< I, k, A → α B. β >

In the chart on if this tuple is not yet available.

During the analysis, a complete constituent B was found between positions j and k. In the chart there is another tuple that reflects the expectation of a constituent B starting at position j. Thus, both may be combined into a new tuple which covers the positions i to k. The point was more than the realized constituent B put forth.

Parsalgorithmus

Input: A set s = w0w1 ... wn.

Output: yes, if < 0, n, S '→ S. > ∈ chart, otherwise no

Note: Actually, this is only one detection method. The actual sentence structures but can be reconstructed with some additional administrative information from the chart (so-called shared packed parse forest ).

The steps under 2 are not ordered in their order. Your order can using various search methods ( depth first search, breadth-first search, best search) are systematized.

Example

Given a context-free grammar with the following production rules:

Lexical rules

The parsed sentence was " observed Donald Daisy with binoculars "

Explanation:

  • Pn, m = shell formation ( predict ) of entry n with m production rule,
  • Sn, Lm = integration ( scan) of the shell formation of entry n with m lexical rule,
  • Cn, m = combination (complete) of the two entries n and m

The fact that entry 33 by combination of entry 1 with entry 32 could be formed, showing that the sentence can be parsed in two ways (that is ambiguous ).

Extensions

Amortization rules

Repayment rules are, inter alia, Production rules of the form A → ε. Such rules are usually integrated with special Vorarbeitungsstrategien in the chart parser.

Perspective ( Look Ahead )

Generating unnecessary chart entries can be prevented by integrating k lookahead symbols in the Charttupel. This technique is also used in LR ( k) parsers.

Separated grammar

For parsing natural languages ​​usually called separated grammars are used in which dictionary and phrase structure rules are separated. The right-hand sides of the rule context-free grammar thus contain only terminal symbols ( alphabet symbols ) or non-terminal symbols. This makes the Predict and scanning process more efficient, as they progress only to the level of pre-terminal (word types ).

Robustness

Since the input of the parser in the sense of the grammar are not always well-formed (see the application of the grammar ), it is useful, the parser robust, ie unsusceptible to make for grammatical errors. This concerns, for example, unfamiliar words, then all probable types of words are entered for the scan step, or missing or superfluous words that are recognized with special error productions.

Complexity

O ( n3) for arbitrary context-free grammars, O ( n2 ) for non - ambiguous context-free grammars.

Applications

Chart parsers are usually used in connection with the syntactic analysis of natural languages ​​, since they - have the best time complexity for arbitrary (ie also ambiguous ) context-free grammars - next to the Tomita parser. For example, the grammar checker in Microsoft Word, a chart parser. For programming languages ​​whose syntax has special properties, such as efficient parser LR ( k) are usually - or LL (k ) parser used.

180112
de