Parsing

A parser [ pɑ ː ʁzɐ ] ( engl. to parse " analyze ", or Latin pars, "part", in German sometimes chopper ) is a computer program for those in the computer science for the decomposition and transformation of an arbitrary input to a further processing is usable format responsible. Frequently parser can be used to unlock the semantics of the input following the analysis process and then perform actions.

Compared to a recognizer which analyzes the input, and outputs, if the latter is in accordance with the instructions correctly or incorrectly, the parser is an input of the analysis in a desired shape, and in addition produces structural descriptions.

The syntax analysis ( parsing ) is outside of the computer science applications, such as in the examination of the structure of the natural languages. In the grammar, the syntax analysis of a set would correspond to the dismantling of the block in its grammatical elements ( syntax). See linguistics.

Application and Examples

In general, a parser is used to translate a text into a new structure, such as a syntax tree, which expresses the hierarchy between the elements.

  • HTML code is pure text. The parser contained in a web browser created from the logical structure of a data structure. The appearance of these elements is separately defined via CSS.
  • RSS parser convert RSS feeds into another data format, such as an HTML page.
  • XML parser parse XML documents and make the information contained therein (ie elements, attributes, etc.) for further processing.
  • URI parser solve schemes such as URLs in their hierarchical structure on (see RFC 3986 ).
  • Log file parser used to extract relevant information from Web server log files, event logs, and other information stored in log files for automated analysis.
  • Search engines crawl and parse websites relevant text passages.
  • Reading a programming language. From the resulting data structure a compiler can then generate machine code or bytecode.
  • A command-line interpreter parses commands along with their parameters for the correct execution of the user (eg via COMMAND.COM ) instructions.

Operation

For the analysis of the text using parser usually a separate lexical scanner ( lexer also called ). This breaks down the ( present as a simple sequence of characters ) input data into tokens ( input symbols or "words" understood by the parser ); because the decomposition follows in token of a regular grammar, the scanner is usually a finite state machine. These tokens are used as input atomic character of the parser. Parsers that do not use a separate scanner, Scannerless parsers are called.

The actual parser as an implementation of an abstract machine ( usually implemented as a pushdown automaton ), however, cares about the grammar of the input, performs a syntax check of the input data, and usually created from the data of a derivation tree (similar to the English sometimes called Parse tree called ). This is then used for further processing of the data; Typical applications are the semantic analysis, code generation in a compiler or execution by an interpreter.

In HTML, a lexical scanner would the HTML file in HTML tags and body text break down and pass these components to the parser - ie the scanner "interested" only the appearance of syntax elements ("if it is enclosed in angle brackets, it is an HTML tag "). The parser processes the other hand, syntactic contexts, that is investigated belong together which pairs of tags and how the tags are nested; the substantive meaning of the tags are not interested in the parser, however, but is being considered by the subsequent processing.

Clearly shown is a parser that software which checks processed and passes the instructions in the source code of the user.

Parser - types

There are different parse method. This is according to general procedure, ie the distinction based on the order in which the nodes of the derivation tree to be created (top-down, and theory -driven parsing or bottom-up, and input -driven parsing and left corner ), specific procedure (LL, LR, SLR, LALR, LC, ...) and implementation technique ( recursively descending recursively ascending or table-driven ) distinguished. Further, it is also distinguished by grammarians Kart.

Parser for context-free grammars

Here are a few based on context-free grammars method:

  • Top-down parser LF parser
  • LL parser
  • LR parsers
  • SLR parser
  • LALR parser
  • LC parser

Parser for context-sensitive grammars

  • Packrat parser ( Parsing Expression Grammars )

The parsing of well-defined artificial languages ​​(see formal languages ​​, programming languages) is less complex than parsing free -grown natural languages ​​such as English or German, which are characterized by a large number of ambiguities, anomalies and inconsistencies. See also Computational Linguistics.

Note: Parsing The term should not be confused with the term compile. The latter produces a target code from source code, it is also parsed among other things, but beyond that further action will take place.

Example

Parsers are frequently used to make a sequence of symbols a tree structure. A typical example of this is math expressions such. This term, as it appears here, there is only first of a series of symbols:

The task of the parser is now recognizing the underlying structure of this symbol sequence. Often this takes the form of a parse tree, which may look like in this case as:

This is the output of a simple parser. This issue can now be analyzed by other programs.

634615
de