Binary search tree

In computer science, a binary search tree is a combination of abstract data structures and binary search tree. A binary search tree, often abbreviated as BST ( from English- Binary Search Tree ), is a binary tree in which bear the node "key", and the keys of the left subtree of a node only less than (or equal to ), and the right subtree only greater (or equal to) the key of the node itself is.

What do the terms " less than" and " greater than" mean, is entirely up to the user; they only have a total ordering (more precisely: the total quasi-ordering see below) establish. The most flexible, the order relation is realized by a user to be provided to end 3-way comparison function. Even if it is a single key field or combination of fields that is up to the user; further whether duplicates (different elements that come out in the comparison but not " equal" than) that are permitted or not. Search functions for this case, see below.

An in -order pass through a binary search tree is equivalent to walking through a sorted list (with essentially the same runtime behavior ). In other words, a binary search tree may offer a lot more functionality (for example, run in the opposite direction and / or a " direct access " with potentially logarithmic runtime behavior - achieved through the principle of " divide and rule ", which allows the remote effect of Transitivitätsgesetzes is ) at the same or only slightly higher storage requirements.

  • 4.1 Search free of duplicates
  • 4.2 Search with duplicates
  • 4.3 complexity
  • 4.4 Search depths and path length sums
  • 5.1 traversal (single step)
  • 5.2 Proximitäts search

Terminology

The terms node and edge, the latter definition as a directed edge (or bow and arrow), taken from the general graph. If the directionality from the context shows clearly enough edge is sufficient.

In a directed graph can be assigned to a node out-degree both as in-degree. Usually, binary trees to be construed as an out - tree. In such a rooted tree there is exactly one node that has in-degree 0. It is referred to as the root. All other nodes have in-degree 1, the out-degree is the number of child nodes is limited and the binary tree to a maximum of 2. So that his order is an out -tree ≤ 2

Node with out-degree ≥ 1 is called the internal (inner) nodes, those with out-degree 0 as leaves or external (outer) nodes. For binary trees - and only there - is sometimes found the name Half Sheet for a node with out-degree 1 (English sometimes: non- branching node). Then, a sheet is a double half- sheet.

Different ways of speaking

The nodes of the binary tree keys are assigned in Figure 1. Since the in-order traversal of the (alphabetical ) sorting order is followed, the tree is a binary search tree.

In Figure 2 is the same binary search tree - but now with explicit " NIL node" - represented. Here we are seeing only the internal node key, whereas the external nodes are NIL node and as a placeholder for the intervals between the keys ( as for example in Figure 4 ), ie act as insertion points of the binary search tree. Node with out-degree 1 does not exist. The keyless search tree consists of exactly one node, the root is external and at the same time. Since in this view, the height of the " really empty" tree ( which is not a search tree ) is defined to -1, so the keyless tree height 0 plays, match the level terms when in the view of the Fig.1 the empty and at the same time keyless tree height is assigned 0.

Node -oriented and leaf- oriented storage

This is called the type of storage node oriented if the contents of the amount stored in the node and the external leaves are empty. Contrast, the leaf- oriented storage, in which the contents of the set are stored in the leaves and the nodes represent only signs for navigation that may have nothing to do with the keys of the crowd.

Disambiguation

Note:

Order relation

Thus, binary search, sorting can work etc., the order relation "total preorder " has a total quasi-ordering, in English, to be. The names for the duplicate relation induced and the induced strict partial order are like there.

The trichotomy of a corresponding 3 -way comparison function - their result is only relevant to the sign - is derived as follows:

After interchanging and and a rearrangement can be seen that

The induced order relation is a strict weak ordering, in English strict weak ordering. It induces the equivalence classes of this relation, more precisely, the equivalence classes of the duplicate relation, a strict total order.

Obviously, any such order can reflect, ie commute with, the result is again a strict weak ordering. Remain neighborly relations exist, it will only "bigger" "right" "left" exchanged " small " or with.

Notes:

  • For pure enough prospecting basically a two -way comparison function that indicates whether two "keys" are the same or not. With such a comparison function but are efficient, for example in the middle logarithmic search times unavailable.
  • For the functioning of sorting and binary searching, it is irrelevant whether the splitting of the non - path of a two -way comparison function is an artifact in a small and a greater- way, such as the arrangement of letters in an alphabet or whether a Näher-/Ferner- or logical relationship ( with ) is in play.
  • Reflects the order relation also resist neighborhood, this can be exploited for efficient " fuzzy search".
  • The binary nature of binary trees fits exactly to search with the 3-way comparison function.
  • As explained in more detail in the following section "Search", the treatment of duplicates regardless of whether the order relation allows duplicates (total quasi-ordering and strict weak ordering ) or not (total order or strict total ordering ). On the one hand it may be undesirable, even if it allows duplicates of these have in the tree. On the other hand, it may well be appropriate, even with a total ordering take duplicates in the tree, eg from the input stream. It comes in the practical application so just depends on whether the tree there will be duplicates or not. Consequently, it is considered here from the outset of total quasi orders.

Search

The search for an entry runs such that the search key is first compared with the key of the root. If both are equal, then the entry (or a duplicate ) is found.

Otherwise (for " not equal " ) is checked whether the search key is smaller ( larger ) than the key in node: then the search is recursively continued subtree of the root in the left (right); there is no left (right) subtree, then the searched entry does not exist in the binary search tree.

The final "not equal " nodes obtained in this way, together with the last comparison towards the so-called insertion point for the search item dar. (in the view of Figure 2, the external nodes, the strength modifiers, the direction is sufficient. ) Is here inserted it then determines the in-order - consistent with the sorting order. The final "not equal to " node has a key, which is the largest or among the smaller of either the smallest among the larger ones. The same is true mirror image of its neighboring nodes in the last comparison direction, if there is such. (However, this is not an " insertion point ", but an ancestor of the same. )

Search free of duplicates

The following pseudo code illustrates Find the operation of the algorithm for a search in which duplicates are to be incorporated into the tree in any case. (This is ultimately regardless of whether the order relation allows duplicates or not. )

The function returns a node and a comparison result. This pair is - except for the comparison result is " Equal" - constitutes an insertion point

Find ( t, s) {    t: binärerSuchbaum    s: search key    return Find0 (t, s t.root, Empty )    / / Returns a node and a comparison result Find0 (t, s, x, d)    t: subtree    s: search key    x: Node    d: Comparison result ( Equal, lessthan, or GreaterThan Empty )    if x ≠ null then {       if s = X.KEY then return (x, Equal) / / search key s found       if s < X.KEY          then return Find0 ( x, s x.left, lessthan )          else return Find0 ( x, s x.right, GreaterThan )    }    return ( t, d ) / / search key s not found    / / Returns (tree, Empty ) or ( node comparison result ) } In the example chosen in the figure are finding the search key s the first hit "F" as the result.

Search with duplicates

Should entries be duplicates allowed in the tree, it is advantageous not to cancel the search at the first best node found, but either the smaller or the larger side to the leaves to look any further. This supports a targeted insertion of duplicates, and is particularly interesting when the search tree is not only to be sought and found, but is may also traversed. The use of the following search functions in the sorting process Binarytreesort makes this method to a "stable" (see stability ( sorting method ) with explanatory examples).

In the following pseudo-code FindDupGE the user points the way to the right, on which side of the duplicates to be inserted, if necessary, a new one.

FindDupGE (t, s, c ) {    t: binärerSuchbaum    s: search key / / FindDupGE: if s is a duplicate,                            / / It should be inserted to the right.    c: Cursor {/ / This cursor contains at the end:       cd: direction / / (1 ) Left, Right, or Empty       cn: node / / (2) node or tree (only when Empty )    }    C.n = t / / for the empty tree    return FindDupGE0 ( t.root, s, c, Empty )    / / Returns an insertion point in the cursor c } FindDupGE0 ( x, s, c, d ) {    x: Node    s: search key    c: Cursor    d: Direction    if x ≠ null then {       C.n = x       if s ≥ X.KEY / / search key s ≥ Knoten.Schlüssel?          then return FindDupGE0 ( x.right, s, c, Right) / / ≥ ( GE)          else return FindDupGE0 ( x.left, s, c, Left) / / <    }    cd = d / / Set insert direction    return c    / / Returns an insertion point in the cursor c } Figure 3: Insertion point in the rightmost duplicate of s The pair ( node, direction) of the previous example Find is here to the one object, called cursor summarized. It is a pure output parameter that specifies the insertion point.

FindDupGE is held so that the result cursor always a direct insertion point is delivered. From the result, however, is not readily identifiable, whether it is a duplicate, since the insertion point does not have to have this key, even if it occurs in the tree. This depends on the more or less random arrangement of the nodes in the tree. Indeed, if the rightmost duplicate ( in the example of Figure 3 " R") a right half leaf, then it constitutes, together with the direction "right" represents the insertion point; it is not, then it is in the hierarchy of the binary tree is an ancestor of the right neighbor of "R", which is a left half leaf now and together with the direction "left" the same insertion point specified and therefore in this case is the result of FindDupGE.

While at Find all 3 ways, the comparison function is requested, FindDupGE satisfied with the query of the second

The following pseudo-code finddup combines the capabilities of the find and FindDupGE, by providing both a result of the existence of the search key as well as an insertion point for duplicates. For this purpose, the user gives a direction d before (left or right) on which side of the duplicates to be inserted, if necessary, a new one. As a result, a pair of (node ​​, cursor ) comes back, where nodes indicating whether and where the search key is found.

The proposal builds exemplified on an object that is ( in analogy eg to the databases ) called cursor. The cursor contains the entire path from the result node to the root. It fits for subsequent in -order traverse function Next, a version that does the father node without hands. The appropriate data structure for the path is the stack, Eng. Stack, push and pop operations with the. The overflow condition when push is not treated explicitly in this proposal; but it could in any case be only reacts with abnormal end what the push function alone as well can. With height- balanced trees is the memory requirements for the stack logarithmically in the number of nodes and thus quite manageable ( pre rechnetes example AVL tree ).

The somewhat simpler version of the function in which a pointer is provided to the Father in each node and therefore the cursor without a stack manages, just missing the push and clear calls. However, the memory requirements for the tree is increased by a fixed percentage.

Finddup (t, s, c, d ) {    t: binärerSuchbaum    s: search key    c: Cursor {/ / This cursor contains at the end:       cd: direction / / (1 ) Left, Right, or Empty       cn: node / / (2) node or tree (only when Empty )       ct: tree / / ( 3) tree ( contains the pointer to the root )       cp: path / / (4 ) Path from the father of the node to the root    }    d: Direction / / If s is a duplicate, it should ...    cd = d / / ... inserted on this page.    ct = t / / initialize the cursor    clear ( cp ) / / initialize the stack    C.n = t / / for the empty tree    return FindDup0 ( t.root, s, c, Empty )    / / Returns a node and an insertion point in the cursor c } FindDup0 ( x, s, c, d ) {    x: Node    s: search key    c: Cursor    d: Direction    if x ≠ null then {       push ( cp, cn ) / / father of x in the stack       cn = x / / set new node in the cursor       if s = X.KEY then return FindDup1 ( x, s, c, C.D )       if s < X.KEY          then return FindDup0 ( x.left, s, c, Left)          else return FindDup0 ( x.right, s, c, Right)    }    cd = d / / Set insert direction    return (null, c) / / search key s not found    / / Returns null and an insertion point in the cursor c } FindDup1 ( q, s, c, d ) {    q: node / / last node with Equal    s: search key    c: Cursor    d: Direction    x: Node    x = c.n.child [ d]    if x ≠ null then {       push ( cp, cn ) / / father of x in the stack       cn = x / / set new node in the cursor       if s = X.KEY          then return FindDup1 (x, s, c, cd) / / x is a new node with Equal          else return FindDup1 (q, s, c, 1 - cd) / / at ≠ further in the opposite direction    }    cd = d / / Set insert direction    return ( q, c )    / / Returns a duplicate and an insertion point in the cursor c } Finddup is held so that the result cursor always a direct insertion point is delivered. If the search key was not found in the node field of the null pointer is returned. If the search key is found, finddup is the leftmost or rightmost duplicate, in the example of Figure 3, the rightmost duplicate " R ", as found previous node. The insertion point may coincide with the node found; but it can also be immediate ( in the example of Figure right ) his neighbor, in which case he has a different key.

In the first part, FindDup0, all three ways of comparison function be queried; when the presence of the search key has been affirmed in the second part, FindDup1, only their second

Complexity

Since the searching operation along a path extending from the root to a leaf, the time spent in the center, and in the worst case, is a linear function of the height h of the search tree from the ( O ( h) ); asymptotically negligible in the best case, however, the running time is constant at Find in FindDupGE and finddup always O ( h).

The height h is in the degenerate case as large as the number of available elements n When setting up a tree, which corresponds to a sorting pass, each element must be compared with any in the extreme case - results in summary comparisons.

Weight -balanced search trees may come to a constant maturity on average, but the behavior is linear in the worst case. Height balanced search trees have a height of O ( log n), thus allowing the search to guarantee logarithmic term. The structure of a tree is then O ( n log n ) comparisons - equivalent to the best sorting algorithms.

Logarithmic height is even true on average for randomly generated search trees when the following conditions are met:

  • All permutations of the inserted and deleted elements are equally likely.
  • With modifications of the tree is omitted " asymmetric " delete operation, that is, the descents in the Deletions to the left and the right stick in the middle the scale.

Search depths and path length sums

Let be a set of keys from a total ( quasi ) ordered reservoir of keys, also were or frequencies, is used to access the element ( or equivalence class, respectively. Interval ), where for resp. for. (These are additional and not to elements belonging to the usual meaning.) The tuple

Ie access distribution for the amount if all are. is the access probability distribution when is.

Let now a search tree for the set with an access distribution, also is the depth of the ( internal ) node, and the depth of the blade ( Fig. 4; each # terminology of Figure 2). We consider the search for an item. If, then, we compare with elements in the tree. If, then, we compare with elements in the tree. So is

The weighted ( with the access distribution) total path length of the tree; is a probability distribution, then the weighted path length weighted search depth or the average required number of comparisons. The Figure 4 shows a search tree that is optimal for the access distribution with a value of.

Are all and all, it is the sum of comparisons required, when each node is searched ( for successful search). It stands for the so-called internal path length l in a Relationship

All of the binary search trees is:

Wherein the upper limit of the linear chain is and the lower of the complete balanced binary trees. The function is piecewise linear, monotonically increasing and convex downward; in formulas: Includes, then.

By the way, with the result A001855 in OEIS.

However, all and all, it is the sum of the necessary comparisons to visit all leaves ( for missing). is also referred to as an external path length E:

It applies to all trees:

Proof: The assertion is true for n = 1 Let T be a tree with n nodes. For a node at height h, we add a new edge and a new node. increases by 1 h and by 2 (h 1) -h = h 2, ie the difference to (h 2) grows - (h 1 ) = 1 ■

The fully balanced binary trees are also optimal for the access distribution, and it applies to all binary search trees:

With.

Incidentally, with the result in OEIS A003314, and the number of external nodes (leaves).

Traversal

Traversing ( crossing ) is the systematic exploration of the nodes of the tree in a certain order.

There are several ways to traverse the nodes of binary trees. When binary search tree, however, the so-called in-order or reverse in -order traversals are clearly preferred, since they correspond to the impressed total ordering.

Traversal (single step)

The following pseudocode Next are starting from a node, the next item in decreasing or increasing order back - an iterative implementation. The proposal does not need a pointer to the parent node. Therefore, to the input object, here called # cursor, contain the entire path from the current node to the root, and this also needs to be maintained accordingly by the next function when Next is used in a loop. The appropriate data structure for the path is the stack, Eng. Stack, push and pop operations with the. The overflow condition when push is not treated explicitly in this proposal; but it could in any case be only reacts with abnormal end what the push function alone as well can. With height- balanced trees is the memory requirements for the stack logarithmically in the number of nodes and thus quite manageable ( pre rechnetes example AVL tree ).

The somewhat simpler version of the function in which a pointer is provided to the Father in each node and therefore the cursor without a stack manages, is listed in the binary tree. However, the memory requirements for the tree is increased by a fixed percentage.

With a longer traversal ( several calls to Next) to half sheets and more senior ancestors alternate.

Next ( c ) {    c: Cursor {/ / This cursor contains:       cd: direction / / (1 ) Left, Right or EndOfTree       cn: node / / (2) node or tree (only for EndOfTree )       ct: tree / / ( 3) tree ( contains the pointer to the root )       cp: path / / (4 ) Path from the father of the node to the root    }    x, y: Node    x = cn / / output node of this single step    d = cd / / desired direction of traversal    y = x.child [ d]    if y ≠ null then {/ / one step in the given direction       push ( cp, x) / / father of y in the stack       d = 1 - d / / reflects Left <- > Right       / / Descend towards the leaves on children in the mirrored direction       x = y.child [ d]       while x ≠ null do {          push ( c.p, y ) / / father of x in the stack          y = x          x = y.child [ d]       }       cn = y / / Result: the next half- sheet in the direction cd       return c / / ( It is half- leaf on his (1- cd) page. )    }    / / At the stop, so climb to the root of the ancestors in the ...    do { / / ... cd "line" (only left or only right )       y = x       x = pop ( cp ) / / father of y from the stack       if x = c.t then { / / y is the root.          / / Thus, there is no element in this direction.          cn = ct / / Result: the tree as the father of the root          cd = EndOfTree / / signaled the end of the traversal          return c       }    Until y } ≠ x.child [ d]    / / There was a change of direction in the ascent:    cn = x / / Result: the first ancestor in the mirrored direction    return c } The traversal over the tree per sheet comprises a descent and an ascent; the cost at n nodes is thus 2 n ∈ Θ ( n ). Therefore, the cost for a single traversal is O (1). In the worst case it is O ( h) as the height h of the tree.

For recursive implementations see binary tree.

Proximitäts search

Together with the search function can be combined with the single-step traversal shown above a " Proximitäts " search implement. ( Proximitäts search is a working title. The facts are nothing special and comes in the literature certainly, but is now not been found. " Fuzzy search " as a fuzzy search is already taken and means something much different. ) This means a search in the vicinity of a specific key, for example, " greater than" (or " less than" ). The above search functions Find, FindDupGE and finddup deliver namely in the "not equal " case, an insertion point. It contains, as the tree is not empty, an element that is the largest of the smaller or smallest of the larger either. In the former case, we can satisfy the desire " greater than" directly. In the latter case, we go to the next element in ascending order, if there is one, and give it back, because it needs to be a bigger one. The logic for the mirrored version is obvious.

A similar approach can be found in the Index Sequential Access Method.

It is assumed that the navigation is already done to the insertion point. Insertion point is a node and a direction (right or left). A direct insertion into a binary tree is always a right (or left ) half- sheet ( ie, a node with no right (or left ) successor ). ( In the view of Figure 2, these are just the external nodes. ) An indirect is the immediate neighbor in the specified direction and specified together with the opposite direction the same location in the binary tree - a real paste but the paste function must descend even to the half sheet, which represents the immediate insertion. The above functions Find, FindDupGE and finddup supply as a result (directly ) insertion ( Find not with " Equal" ).

To insert lets you direct the insertion point (the child in the appropriate direction ) to the new element, so this is correct inserted according to the total quasi-ordering. The complexity of the insert operation is thus constant. However, if a search operation with added, this dominates the complexity.

After inserting the new element is a leaf of the search tree.

By repeatedly inserting ascending ( or descending) assorted keys, it may happen that the tree degenerates into a linear list.

The deletion in binary search tree is not different from deleting in a binary tree. Where is the node to be deleted.

Repeatedly deleting it may happen that the tree degenerates into a linear list.

Because of the unavoidable Downs to the half- sheets, the complexity of the erase operation in the worst case O ( h), where h is the height of the tree.

Selection criteria

The binary search in the array can be used as a " precursor " of the binary search trees are considered. Since it is linearly related with insertions and deletions and then the memory management of its array must be carefully considered, it is used in practice almost only for static, pre-sorted tables. So, are insertions or deletions important for the application, the binary trees are suitable. With respect to search time and memory to binary search in the array and height balanced binary search trees behave asymptotically equal.

Although purely random binary search trees behave logarithmically means guaranteed a binary search tree without any provision, which counteracts degeneration, not under a linear running time. Degeneracies can systematically occur, for example, if a programmer assigns masses of closely spaced jump label name.

However, there are many concepts that have been developed to ensure an adequate balance. This always costs and benefits are compared to. For example, the effort to maintain a binary search tree ever completely balanced, so high that this should only worthwhile in applications whose running time is dominated in extreme ways of searching.

An important criterion for selection is whether the binary tree is static, and so a single optimal structure is sufficient or whether changing operations such as insert and delete are important. For the former, weighted search trees come into consideration, also including the Bellman algorithm. The latter are height - balanced search trees such as AVL tree and red-black tree, but also splay trees of interest.

A comparison of the complexities of various search algorithms can be found in Article search tree; Transit time measurements using realistic examples can be found at Pfaff 2004b.

These considerations, it was generally assumed that the whole tree is placed in the memory ( main memory). Playing hits a role to external media, all other criteria are added. Even the B- tree, which tries to take into account such considerations, although is a search tree, but not binary.

125890
de