Wavelet Tree

In computer science is meant by a wavelet tree is a compact data structure to store compressed strings. It extends the methods and by a bit vector to an arbitrary alphabet.

The data structure was first described as a main component to the compressed and full-text indexing is considered slight generalization of a data structure of computational geometry. The wavelet tree can be described recursively. Each node distributes the string on his two successors. Here, the remaining alphabet is divided among the child nodes. A bit vector stores the allocated partition for each character.

The origin of the name of the Trees is the wavelet transform used to reduce image data and for the approximate evaluation of expressions of relational algebra.

  • 5.1 sequence of values
  • 5.2 sorting
  • 5.3 Grid of points

Construction

A wavelet tree of the sequence over the alphabet can be described by a partial alphabet recursively. A wavelet tree over the alphabet is a balanced binary tree with leaves.

Properties

The tree has a height of described, has leaves and internal nodes. It stores the bits at each level and at most in the lowest level. Thus, the tree can be uses at most represent bits. Strictly speaking this representation requires more bits for the pointer.

Let now a string from the alphabet of length so can be used as wavelet tree are represented with bits.

Operations

The Wavelet Tree supports the operations, and in time, if a balanced tree was constructed.

Access

: Direct access to the i-th element in the string.

To calculate the mark at the position of the bit vector is considered. If the value at that position is, it is and we perform the procedure on the left child node recursively, otherwise applies and the algorithm processes the right child. To this end, the new position of the bit vector must be determined. The new position is the number of zeros in the vector to the position i if the following holds. If the right child processed recursively, so the occurrence of ones need to be added up. For this, the function is used respectively to a bit vector.

The Rank function on bit vectors can be evaluated by additional bits in constant time using.

Rank

: Number of characters up to position i in the string.

Determining the rank is analogous to the access operation. After execution of the access algorithm results in the ranking of the number of occurrences of up to position i in the leaf nodes.

Select

: Position of the i- th occurrence of the symbol q in the string.

In order to determine this position, the algorithm starts at the sheet which q represents. Now, the algorithm traverses the nodes recursively to the root: If the node is a left child, then the new position in the parent node results from the position of the i-th in the associated bit vector. Is the child a right successor, we obtain the new position from the position of the i-th. Selekt this operation on a bit vector may be performed in constant time with additional.

Compression

The space consumption can by removing redundancies by means of different methods to bits with the same maturity of operations, or bits and constant running time for rank and Selekt be reduced.

Application

This data structure is used in various applications. Wavelet Trees come in applications for the representation of three different classes are used.

Sequence of values

The wavelet tree represents a string. The operations used are the three basic operations mentioned on the tree. This representation is the most widely used.

Sorting

The tree describes an ordered representation of the outgoing string. The leaves of the tree represent the sorted sequence. This leads to two additional operations. returns the position of the character in the sorted sequence. Conversely, the ascent from r'ten leaf to the root, the position of the element i with. This representation has been used by the inventor of Wavelet Trees.

Grid of points

Here, the wavelet tree represents a set of points.

Extensions

In the literature there are some extensions of the trees. To minimize the amount of wavelet trees, t'näre be used instead of binary nodes. Thus, the node degree is increased to t children and decreases the depth of the tree. Increase in the string operations such as inserting and deleting characters at arbitrary positions, the dynamics of Wavelet Trees and enable support dynamic FM - indexes.

815168
de