Index (computer science)

Index structures ( indexes ) are used in computer science, in order to ensure rapid access to data in an extensive data collection. Data is typically managed sequentially on a storage medium. The processing of a query would then connected with a linear cost in the worst case, the complete data would have to be searched.

Now if a particular set of data searched by a search criterion in this data set, an elaborate search can be avoided by an index structure. The index makes it possible to determine the position of the record within the medium quickly.

Known methods

Index structures are even special data structures such as

Assorted series ( efficient search using binary search)
Hash tables
Tree structures such as binary trees, B- trees, B * - tree

For special requirements, there are also special index structures. For example, use geo-databases for indexing multi-dimensional data of the R - trees. These trees allow multidimensional search criteria and distance calculations.

Bitmap index, grid file, kd - tree, R- tree, UB- tree, range tree for multidimensional data structures

Principle of operation is an example

A typical index card system can be called an index structure: The set of addresses is divided so that the basis of the search criterion ( " name "), only a certain subset of all addresses out of the question. A typical division would be after the first letter. Then, when the name " Müller " sought only the subset that starts with "M" must be searched. This subset can usually be found quickly via a label in the file. If this subset is still too large, the subsets can be divided over the second or third letter on. Now, instead of searching all names, the names are only searched, starting with " Müller ". This can be done in a relatively short time.

To find out who is on any given day birthday, but such an index card system is not suitable: you would always have to search the entire file. It is said that the attribute "birthday" was not indicated. Remedied by a second index, in which, for each calendar day usually only a reference is stored on the primary index, typically the name here. Formally referred to something as " secondary index " or " external index ". However, in order now to send, for example, for a particular day greeting cards for birthday, it is necessary first to identify the names of the receivers, then the primary index to look up the associated addresses. The advantage is that a change of address only in the primary address register needs to be made, not in addition to the birthday list.

Any fully sorted list also constitutes a simple kind of index structure that can be searched efficiently by means of the so-called binary search. Here, the list is repeated mentally divided into two parts, and because of the existing order can easily be determined which of the two parts must be the right item. Such a search is possible, for example in a printed telephone directory. A more advanced, based on the same idea index structure is the B-tree. Significant advantages are here in a simpler upgradeability. In a hand-guided address but you have the problem that you can not insert entries, so you can realize neither a complete sorted list nor a B- tree over there - the index card system described above ( in which not fully sorted within an initial letter) However, already. On the usually out of hand amounts of data, this is not a problem.

Use

Databases are the best-known application of index structures. Because here particularly large amounts of data need to be processed - especially as large amounts of data that they do not fit entirely into the main memory - is rapid access critical here. So appropriate indexes can be defined, which lead to a significant performance increase on tables. See also database index.

Geographic information systems use index structures often in the form of a database, but sometimes can be a direct integration of the index structure may be necessary to get the desired performance. Here spatial index structures are used by the search range limit.

A less obvious application is the computer graphics, particularly in real-time applications and 3D computer games. Here spatial index structures such as the BSP tree are often used to manage the environment.

Types

Index structures can be divided according to their basic operation in various types.

Internal and external index structures

Internal index structures store the actual data itself, external structures, only references to the data ( for example, in another index). One speaks here of a "primary" or " secondary" index. A secondary index generally requires less memory and is easier to keep in sync with changes to the data. However, it is also slower, because always the actual data must be fetched from the primary index.

An example of a secondary index would be as follows: address data as described above on index cards, sorted by name stored. As a secondary index, a list is maintained that maps the name to the family name, so for example "Max " to " Smith" and " Smith". However, it is placed here only the name, not the full address. Changes such as the telephone number, so it is enough to update this in the primary index. But if you want the phone number of each "Max" determine, the family name must be looked up first in the secondary index, then with the family name in the primary index telephone number. If one were to take the full address under the name, this would be a second primary index, any change would have to be made but in both card indexes.

Secondary indexes allow emulation of multidimensional index structures. However, they are inferior in many applications, a real multi-dimensional index structure.

Dynamic and static index structures

An index structure is called dynamic if they dynamically adapt its structure when changes to the data on this, otherwise it is called static.

Static index structures allow faster access to the data, but must be calculated costly new to changes to the data. Thus, on average, half of the stored data must be moved, for example, when inserting into a sorted row. Dynamic index structures make it possible to update the index dynamically on every change. It is also said that when changes only a "local" change is made to the index structure, so only a small part of the index changed as a result.

A typical example of a static index structure, the ( static ) and dynamic hash table index structure of the B- tree. The tree depth and structure of the data set is automatically adjusted here. Changes to the tree frequently happen only in the affected blade or a path from the root to the concerned sheet.

One - dimensional and multidimensional index structures

Typical index structures indexed by a single attribute ( for example, for " Last Name "), even if the data contains additional attributes. Order ( or combination thereof ) to allow requests for such additional attributes, additional one-dimensional secondary indexes are created. But if you want after two attributes request simultaneously, the average must be calculated. This large amount of data to be loaded first and then discarded in the calculation of the cut in unfavorable cases.

An index structure can answer the queries with more than one attribute (eg, " first name " and "Last Name" at the same time ) efficiently is, multidimensional. This is used primarily for spatial data application, for example in geographical information systems. Queries of the type " What is the nearest gas station " or " All objects in the rectangle R " can not be answered on the basis of a single dimension, but require a rectangle query in the index structure. Requests for only a single attribute can still be answered then accelerates often by so selects the query rectangle that completely covers the range of values of the unnecessary attributes.

Clustered and non-clustered index structures

An index structure is called clustered, when the data is physically sorted exactly as they are needed in the index and in queries. For range queries can be answered efficiently, while at successive operations of adjacent elements mostly the same data pages are required, and can be buffered so.

An alphabetical sortieres after the family name address with their own pages for each letter is functionally equivalent to a clustered index. A hash table is not clustered normally; the data of adjacent keys are usually in very different data pages. A B tree is an example of a clustered index structure, each data page corresponds to a disjoint interval of the key space.

Small and densely occupied index structures

An index structure is called sparse if it in their data structures also "gaps" (ie unoccupied memory positions ) allowed. Therefore Sparse index structures occupy additional memory, and finding those items need to be addressed. For dynamic index structures, however, this is also an important advantage: in such a gap easily add new data can be inserted. Conversely, new gaps can occur without requiring the index to be laboriously re-organized the removal of data. The reduced reorganization effort is therefore often changing data at a distinct advantage in spite of the increased memory consumption and increased search time. Many sparse index structures allow for dynamic resizing, and try for example to ensure that they are not more than twice as much space as necessary.

An ordered set is an example of a densely occupied index structure. Hash tables and trees are sparse in general.

Assessment

Disadvantages of index structures are an additional administrative burden of the structure itself, and occasionally a high memory cost.

Data structure
Database index

Sequence (disambiguation) Gridfile K-d tree UB-tree Table (information) Cache (computing)

377987