GLR parser

A Tomita parser ( by Masaru Tomita ) is a context-free grammars for Parsverfahren, which is a generalization of the LR ( k ) process. The process is therefore called GLR (k) method ( for Generalized LR (k)).

Starting point of the Tomita parser is the table creation process of the LR (k ) process. In grammars that do not have the LR ( k) property (including, but not limited to ambiguous grammars ), this process leads to multiple entries, so-called conflicts:

  • Shift-reduce conflict: it is possible to place the next input symbol onto the stack of the parser or a detected to replace right side of a production rule by the left side of the rule.
  • Reduce-reduce conflict: there are at least two production rules by means of which a reduction can take place.

The algorithm of Tomita parser pursues these conflicts pseudo- parallel further. The data structure is a so-called graph stack ( graph structured stack) - a directed acyclic graph - used, which represents all already completed Parsoperationen.

Graph stack

The representation used the Parsergebnisse happens - similar to the chart parser - by means of a directed acyclic graph.

Figure 1 shows such a graph after the end of the parsing process for the example sentence " they observed the burglar with binoculars ."

The grammar used in this case, is ambiguous:

For the example sentence, it allows two different syntactic analyzes. Because of this ambiguity, it has not LR ( k) property, thus resulting in multiple entries in the Parstabelle.

Each graph node has a unique node name (it starts with " n"). The red nodes contain elements from the vocabulary of the grammar, ie the one hand non-terminal symbols and on the other hand words ( terminal symbols). The blue nodes, however, contain state numbers of LR- Parstabelle. You can see beautiful, n21 two until then converge various analyzes in the integration of the preposition "with" again as in the node. The following prepositional phrase " with the binoculars " ie analyzed only once.

Parsalgorithmus

As in the LR ( k) the process consists of a series of table Parsprozess controlled shift or reduction steps. The shift step, a word is removed from the input and placed on the stack. In a reduction step the right side ( γ ) of a production rule A → γ, which is in reverse order on the stack is replaced by the left-hand side of the rule (A). In contrast to the LR ( k) process, the reduced part is not deleted from the stack, but is preserved. This means that the stack grows monotonously.

The process is controlled by the GLR (k) table. At any time the parser is in a defined state (see pushdown automaton ). With this state and the / the next symbol (s ) of the input, the GLR (k ) table consulted and the next operations (shift reduce, accept, error ) and the next state is determined. In the case of multiple entries ( conflicts) they are quasi pursued in parallel. However, subsequent shift operations can synchronize again this parallel processing lines; This is important for the time complexity of the process.

Relationship to other Parsverfahren

The Tomita parser is a precompiled chart parser.

779969
de