R-tree

An R - tree (English R -tree ) is a multi-dimensional (spatial ) dynamic index structure used in database systems. Similar to a B-tree we have here is a balanced index structure.

Usage

An R - tree allows fast searching in multidimensional extended objects. In addition to simple points to include polygons in two-dimensional space and geometric shapes in three-dimensional space. But are also typical high-dimensional objects, such as those found in mathematics or bioinformatics. High-dimensional here means that the number of independent search criteria is of the order of 100-1000000. Experience say, however, that an R- tree works well only up to about 10 dimensions. However, this depends on the specific data, and there are R- tree variants that contain optimizations here.

An R - tree can efficiently range queries (ie, rectangle or interval queries) answer. The evaluation of next -neighbor queries efficiently is possible for geometric distances, which can be bounded with rectangles. An R - tree is dynamic, that is, it can be updated efficiently when it changes.

Typical application of an R - tree index are geo-databases. Here, for example, road data, border areas or building data is managed. These are stored in the database as a polygons. The R-tree later allowed the efficient locating specific polygons based on the spatial coordinates. Since here the rectangle queries (eg map sections ) are typical, an R- tree is well suited to.

Less advantages of the R- tree in subspace queries ( queries where not all dimensions are specified - in this case can result in large areas of inquiry ) or non - geometric distances ( which can not be translated into rectangular areas). Here other index structures are more suitable.

Realization

The principle of an R- tree as a spatial index structure was presented by Antonin Guttman. Similar to a B * tree leaf nodes contain an R- tree to index spatial data. But in contrast to this it contains no separating keys, but stored in the index node rectangular data regions (ambient minimal rectangles), which include all data regions lying under the subtree. In the data pages data points can be stored, rectangular regions or polygons. The latter however, are often approximated by a minimal surrounding Reckteck.

Is realized by storing a rectangle by specifying intervals for all of its dimensions. Can use an R- tree -point or Enthaltenseinanfragen as " point P is contained in Figure F? ", " Figure F1 is contained in Figure F2? " Or " which characters are included in Figure F? " Be answered. Due to the approximation used (ambient minimal rectangles) in the leaf nodes may lead to false hits that need to be resolved by review of the request to concrete objects. With polygons, one speaks in the calculation of the rectangles of a " filter step " in the test with the exact polygon of the " refinement step ". The actual index structure thus provides only candidate here.

R *-tree

A popular R-tree variation of the R *-tree by Norbert Beckmann, Hans -Peter Kriegel, Ralf Schneider, Bernhard Seeger and. This variant attempts to minimize by a more developed Strategy split the overlapping of rectangular regions. This usually less subtrees must be searched in a search query, and the requests to the tree to be more efficient. In addition, the overflow of a page, elements can be fully reinserted into the tree (re -insert ), which is a split (and the rising under circumstances height of the tree ) can be avoided. Thus, a higher degree of filling is achieved and thereby also improved performance. The resulting tree has always to an R- tree, the request strategy is unchanged.

R -tree

In the R- tree and R *-tree, the search spaces could overlap. This is not in the R -tree allowed ( the search spaces are disjoint here). Enclosing rectangles if necessary " separated " in the search space boundaries. This can also be objects " cut ", so that part of the object is present in the other search space in a search space and another part of the object. In such cases, objects must then be taken more than once in the tree. R - trees can be used for static data precalculate well (especially for point data where such cutting is omitted). In case of changes, however, it is more complicated to ensure the disjointness.

Problems of R - trees

Like any index structure, R - trees a compromise, the management and updating of the tree requires additional calculations, it is also required additional memory for the index pages. Conversely, for the tree (with appropriate requests and data ) often find the records we want faster. Assuming increased memory consumption and increased computing time for changes in purchase, so you can often achieve better performance on queries. Thus, an R - tree is usually more memory than an R or R *-tree. It delivers higher performance in the search, it does require more effort when it changes.

R * - trees are a popular choice as it without any additional memory overhead (compared to an R- tree, in contrast to the R - tree of objects multiple stores if needed) get along, and their implementation and computational complexity is not much higher than in a R tree. Specifically, an R- tree in memory also always an R * - tree, they differ only in the insert and split strategy.

R- trees are order dependent, an important measure for optimization are therefore "bulk loads", in which the tree is not established by element- wise insertion, but an attempt is made to directly construct an improved tree. For this purpose, the elements are often first sorted and then constructs the tree bottom-up. If an R- tree for each element filled in an unfavorable order so he can have a ( geometrically ) unbalanced structure. Although an optimized R- tree has roughly the same depth, but its regions overlap less and therefore cover the data space evenly.

In higher dimensions, the so-called curse of dimensionality occurs. The R - tree which leads to the fact that the resulting rectangle regions overlap to ever larger parts, so that the search must always handle larger parts of the tree. This reduces the performance of the index structure. The X- tree by Stefan Berchtold, Daniel A. Keim and Hans -Peter Kriegel presents here an important variation of the R-tree concept for higher dimensionalities represents an important measure in this case are so-called " supernodes ", which are enlarged pages from the X - tree can be used to avoid unfavorable splits ( with a high overlap). In extreme cases it may even so degenerate in a linear file, which may be desired.

While an R- tree is designed as a dynamic index structure, so its performance may deteriorate due to systematic insert, or delete operations. In such cases it may be useful to rebuild the tree from time to time by a bulk load to optimize it. There are also incremental optimization strategies (such as the re - inserts in the R *-tree ) with which a database system during idle times can improve the index.

Availability

A reference implementation of the R * - tree is available ELKI the Chair of Professor Kriegel, including implementations of an M- tree and other comparative method in the software package.

An R * - tree can also be found for example in the SQLite database systems ( for 2-5 dimensions) and Oracle ( for 2-4 dimensions).

For PostgreSQL, the index type " RTREE " can be used on the Generalized Search Tree interface exists an alternative implementation names " rtree_gist ".

Coordinate system Minimum bounding rectangle Hans-Peter Kriegel Top-down and bottom-up design Quadtree K-d tree UB-tree Range tree Gridfile Java-Applet Digital Object Identifier

667943