CYK algorithm

The Cocke - Younger - Kasami algorithm ( CYK algorithm ) is an algorithm from the field of theoretical computer science. With it can determine whether a word in a certain context-free language belongs. In technical terms this is called solving the word problem for context-free languages. With the help of backtracking of the parse tree and the parse trees of a given word of the language can be constructed. To apply the algorithm, must be submitted to the given language a grammar in Chomsky normal form. The independently developed in the 1960s by Itiroo Sakai, John Cocke, Tadao Kasami, Jacob Schwartz and Daniel Younger algorithm uses the principle of dynamic programming.

Description

This description follows Hopcroft / Ullman (1996).

As input the algorithm receives a context-free grammar in Chomsky normal form and the test word. Now, for each part of a word (ie: starts with the index and has length ) the set of nonterminals calculated that generate, denoted by.

According to the principle of dynamic programming are only for the smallest part of words calculated, stored and then used to thus efficient computation of the next larger subwords. The smallest part words are single letters. Since the context-free grammar in Chomsky normal form is given, each letter can only be derived in exactly one step from a nonterminal.

A nonterminal of a grammar in Chomsky normal form can not be derived in several terminals in one step. Therefore, a sub-word, that contains more than one character can be produced by only one rule. Since nonterminals can not generate the empty word ( ε ) has to generate the left and right part of. It follows:

In other words, capable of generating, when it can be deduced according to the rules and production, and in turn, to be derived and, therefore.

The word problem can be decided now simple: can then be accurately generated by the grammar if and only if. In are all variables that can generate the partial word that begins with the first letter and its length, so the whole word.

Algorithm

From the description of results in the following algorithm:

For i = 1 ... n    For each production      If r =        Set For j = 2 ... n    For i = 1 ... n - j 1      For k = 1 ... j - 1        Set If, stop and enter "w is generated by G" from Stop and give "w is not generated by G" from example

The issue is whether the word can be generated by the grammar. The productions of the grammar are:

The algorithm can be carried out by means of a table. It is stored in the -th row and th column, that is the amount of non- terminal symbols, which make up the sub-word can be derived, which begins in the index and its length.

Because, can the given word under the grammar derived from. So a word of the language.

Complexity

The algorithm decides in time, if a word of length in the language is ( cf. Landau symbols to describe the notation). This space is required in the order.

195505
de