Extended Backus–Naur Form

The Extended Backus -Naur Form, EBNF short, is an extension of Backus -Naur Form (BNF ), which was originally introduced by Niklaus Wirth to represent the syntax of the programming language Pascal. It is a formal Metasyntax ( metalanguage ), which is used to represent context-free grammars.

EBNF standardized 14977:1996 (E) of the ISO as the ISO / IEC. The examples in this article are based on the ISO standard. Occasionally, other advanced variants of BNF are called EBNF.

Basics

A text such as source code of a computer program initially consists of terminal symbols, that is, from the visible characters - letters, numbers, punctuation, spaces, etc.

The EBNF defined production rules in which sequences of symbols are each associated with a nonterminal symbol, such

Except zero digit = "1 " | " 2 " | " 3 " | " 4 " | " 5 " | " 6 " | " 7 " | " 8 " | " 9";   Digit = "0" | digit addition zero; In this production rule, the nonterminal symbol digit is defined, which is always on the left side. The vertical line represents an alternative, the terminal symbols are enclosed in quotes and end with a semicolon as a delimiter. A digit is thus a 0 or a digit Besides zero, which can be up to 9 again and so 1 or 2 or 3.

A production rule may also include a sequence of terminal or non- terminal symbols, the parts are connected by commas, such as:

Twelve = "1", "2";   Zweihundertundeins = " 2", " 0", " 1";   Dreihundertzwoelf = "3", Twelve;   ZwoelfTausendzweihunderteins = Twelve, Zweihundertundeins; Expressions that can be skipped or repeated, can be represented with curly braces { ...}:

NatuerlicheZahl = digit except zero, { digit }; Here are the lyrics fit 1, 2, ..., 10, ..., 12345, .... It should be noted that everything within the braces, as often, however, may not once.

An option may be indicated in square brackets [...]:

Integer = "0" | [ "-"], NatuerlicheZahl; Is an integer that is zero (0) or a natural number, which can be optionally preceded by a minus sign. So here fit all whole numbers such as 0, -3, 1234 etc.

Moreover, the possibility is provided to allow a predefined number of repetitions.

LeerzeichenAlsTab = 4 * "", " Yes"; Here four times the symbol "" is expected before the character string "Yes".

Motivation to extend the BNF

The BNF requires some complicated constructs to represent optional elements, ie elements that can be skipped, as well as repetitive elements. In the specification of PL / 1 square brackets " [ ... ] " have already been used for options. Niklaus Wirth has been introduced in the definition of the Pascal language in addition curly braces " { ...} " for reps in the BNF and called this extended BNF (extended BNF ).

All formulations in an EBNF syntax can be expressed in BNF. The EBNF was created by Wirth for reasons of better readability and more compact notation.

Number defined in BNF

A number is a sequence of digits with an optional minus sign as the sign. In BNF is necessary to use several alternatives and a recursion for the number repetition:

BNF

:: = | - | 0   :: =   :: = | Lies: A number is either a positive number or a minus sign followed by a positive number or zero character. A positive number is a digit other than zero followed by an optional sequence of digits. An optional sequence of digits is a number followed by an optional digit string or empty.

In EBNF can represent this in a single rule without recursion:

EBNF

Number = [' - '], § Except zero, { digit } | "0"; Lies: A number consists of an optional minus sign followed by a digit other than zero followed by any number of additional digits (including none other number). Or: A number consists of the character zero.

The minus sign can be omitted. The repetition may also not once occur ( optional repeat). The EBNF requires only a single rule with no alternative, while the BNF two rules with four alternatives needed, including a recursion ( includes itself in its own definition).

The EBNF identifies terminal symbols in quotation marks and uses an end character. Nonterminal symbols are not enclosed in angle brackets. Through the quotes be confused.

Other additions and modifications

The EBNF eliminates some of the weaknesses of the BNF:

  • The BNF used even the symbols (<,>, |, :: =). When these occur in the defined language that BNF can not be used without modification or explanation.
  • A BNF syntax may actually contain only single-line rules.

The EBNF solves these problems:

  • Terminal symbols are always written in quotation marks ("..." or '...'). On the angle brackets ("< ... >") in non-terminal symbols can then be dispensed.
  • An end mark, usually a semicolon marks the end of each rule.

In addition, extension mechanisms, defining the number of repetitions, removal of alternatives ( for example, all characters without quotes), comments, etc. are provided.

Despite all the extensions EBNF is not " more powerful " than the BNF in terms of the languages ​​they can define. In principle, each defined in EBNF grammar can be represented by rules in the BNF, but this often results in a much more detailed description.

Under certain circumstances, each extended BNF is called EBNF. So the W3C uses an EBNF for Spezikation XML.

Example

A simple programming language which allows only assignments can be defined in EBNF as:

A syntactically legal program would then

PROGRAM DEMO1   BEGIN     A0: = 3;     B: = 45;     H: = -100 023;     C: = A;     D123: = B34A;     DONKEY: = GIRAFFE;     TEXT LINE: = " Hello, world!";   END. The language can be easily supplemented by control structures, arithmetic expressions, and input and output statements. Then this would create a viable, small programming language.

The following characters, which are recommended in the standard as a normal presentation have been used here:

252562
de