Heapsort

Heapsort ( "heap sort " ) is a process developed in the 1960s by Robert W. Floyd and JWJ Williams sorting method. Its complexity is at an array of length n in the Landau notation expressed in and is thus asymptotically optimal for Sort by comparison. Although heapsort works in-place, but is not stable. The heapsort algorithm uses a binary heap as a central data structure. Heapsort can be used as an improvement of selection sort are understood and is related to treesort.

Algorithm

The input is an array of elements to sort. As a first input to a binary max-heap is transferred. From the heap property follows directly that now the largest element is in the first array position. This is swapped with the last array element and the heap array size is decreased by 1, without releasing the memory. The new root of the heap may violate the heap property. The heapify surgery corrected, where appropriate, the heap, so now the next larger or equal element is in the first array position. The Vertausch, REDUCTION and heapify steps are repeated until the heap size is 1. Then the input array contains the elements in ascending sorted order. In pseudo-code:

Heapsort (Array A) build ( A) assert ( isHeap (A, 0)) tmp = A.SIZE while ( A.SIZE > 1) A.swap (0, A.SIZE - 1 ) A.SIZE A.SIZE = - 1 heapify (A) assert ( isHeap (A, 0)) A.SIZE = tmp assert ( IsSorted (A)) When sorted in descending order a min - heap is used instead of the Max - heap. In a min - heap is the smallest element in the first place. (: Relation mathematically ) determined that defines a total order on the elements according to the definition of a binary heap, the sequence of elements in a heap by a comparison operation. In a min - heap that is the relation and in a max-heap, the relation. The pseudocode abstracted from the comparison operation.

The items to be sorted are also referred to as ( sorting ) key. Can contain multiple data components, the input array per index position. In the case where a component must be defined as a sort key, in which the comparison operation is working. The Vertauschoperation swapped complete array entries.

The assert operation in pseudocode documented what properties the array according to which algorithm steps correctly fulfilled and shall meet.

Example of Heapsort with numbers

In the picture is the sort of example sequence of numbers

23 1 6 19 14 18 8 24 15 shown with the Heapsort algorithm. The individual images are left in chronological order from top to bottom and to the right. In the first field the unsorted input and in the last the sorted output is shown. The transition from the first to the second sub-image corresponds to the Heapifizierung of the input array. The elements involved in a swap operation are red and marked with broken arrows, thick double arrows indicate the parties to a heapify operation elements and green highlighted items show the already sorted portion of the array. The element indexes are shown with small black nodes, each at the bottom left of the element value. A blue deposit of the array elements indexed the term of the heapsort procedure.

The indices correspond to an ascending numbering according to level-order, starting with 0 In one implementation of the algorithm, the tree structure is implicit and the array of the elements connected, as indicated by the placement of the element in the index image.

Efficiency

One can show that the structure of the heap in terms of Landau notation, can proceed in steps. In a large, randomly distributed data field ( 100-1010 data elements) are on average more than 4 but less than 5 significant sorting operations per element necessary (2.5 and 2.5 data comparisons assignments ). This is because a random element with exponentially increasing probability of a suitable parent node is (60 %, 85 %, 93 %, 97 %, ...).

The heapify - operation requires in the worst case (worst case) steps. This is exactly the case in an inverse order. Average and worst case, however, differ practically only slightly: to. Budget is only a field whose elements have almost all the same value. But there are only less than 80% of the data is identical, then the term already corresponds to the average case. An advantageous arrangement of data with multiple distinct values is inherently impossible, since this contradicts the Heapcharakteristik.

The worst case represent largely with presorted data, because the Heapaufbau de facto represents a gradual complete inversion of the sort order. The lowest, but unlikely, is a case already inversely sorted array (1 comparison per element, no assignment). The same applies if almost all of the data are identical.

In heterogeneous data - pre-sorted or non - dominated heapify with at least 60 % of the time, usually about 80%. Thus, heapsort guarantees a total term of. Even in the best case, a term of needs.

Demarcation

On average, Heapsort is only faster than Quicksort when comparisons on the data to be sorted are very complex and at the same time the disadvantage of Quicksort data arrangement is (eg many of the same elements). In practice, for unsorted or partially sorted data or Introsort Quicksort by a constant factor (2 to 5 ) is faster than heapsort. However, this is controversial and there is analysis, see the Heapsort forward, both from implementation as well as from information-theoretic considerations. However, the worst-case behavior of quicksort over in favor of Heapsort. Introsort other hand, only in degenerate cases, 20 % to 30% in almost all cases faster than Heapsort, slower.

Variants

Bottom-Up Heapsort

The most important variant of Heapsort algorithm is bottom-up heapsort, which can often save nearly half of the required comparison operations and, consequently, especially as profitable, where comparisons are complicated. (See there also the scientifically crucial references to Heapsort. )

Smoothsort

Normal Heapsort sorted already largely pre-sorted fields not faster than others. The largest elements must always wander to the extreme front to the top of the heap before they are copied back to back. Smoothsort changes this by reversing the order of the heap in principle. In this case, however, considerable effort is needed to maintain the Heapstatus the sort.

Ternary heap

Another optimization used instead of binary, ternary heap. This heap is based on binary trees instead of on trees, where each node has 3 children fully occupied. Thus the number of comparison operations can be reduced to about 3-4 % (at a few thousand to a few million elements). In addition, other expenses drops significantly; in particular need to be moved by the shallower heap almost a third fewer elements as it percolates ( heapify ).

In a binary heap, an element of 2n comparisons are lowered to n levels and has thereby an average of 3/2 · (2n -1) indices skipped. In a ternary heap, an element with 3m compare be lowered by m levels and has skipped an average of 3m -1 indexes. If at the same range of relative effort x: = 3m/2n, then applies

For n = 18 ( realistically at just under 1 million elements ) gives 0.977; 1 falls below the above n = 10. In practice the savings are somewhat larger, eg because ternary trees have more leaves and therefore fewer elements need to be drained in the construction of the heap.

Overall, the sorting with ternary heap can save 20-30 % of the time with large arrays (several million elements) and simple comparison operation. At low fields (up to about a thousand elements ) to ternary heap does not pay or are even slower.

N-ary heaps

In a still wider or flatter trees heap, the number of comparison operations required increases again. Nevertheless, it can be advantageous because of other expenses (mainly for copying elements ) continues to fall and is accessed systematically to the elements, which can increase the efficiency of the cache. Carry only a simple numerical comparison provided in large fields, and write operations are relatively slow, 8 or more of children per node may be optimal.

Simple example implementation with heaps of arbitrary arity (WIDTH, WIDTH with > = 2) in the C programming language:

Static size_t heapify (int * data, size_t n, size_t WIDTH, size_t root) { assert ( root < n ); size_t count = 0; int val = data [root ]; size_t parent = root; size_t child = 0; assert ( parent * WIDTH > = parent ); / / Overflow test while (( child = parent * WIDTH 1 ) data [ child max] ) max = i; child = max; if ( data [child ] < = val ) / / no child more greater than the break; / / Value to seepage data [ parent] = data [child ]; Back / / largest child up parent = child; / / Continue in the layer below } data [ parent] = val; / / Enter the screened value return count; } size_t nhsort (int * data, size_t n, size_t WIDTH) { assert ( WIDTH > 1); if ( n <2 ) return 0; size_t m = (n WIDTH - 2) / WIDTH; / / First leaf in the tree int count = 0; / / Counter for number of comparisons assert ( m> 0); for ( size_t r = m, r > 0, - r ) / / Part 1: Construction of the heap count = heapify (data, n, WIDTH, r -1); for ( size_t i = n - 1, i> 0, - i) {/ / Part 2: actual sorting int tmp = data; / / Top of the heap behind the heap in data = data [ i]; data [ i] = tmp; / Move / the sorted range count = heapify (data, i, WIDTH, 0); } return count; } For comparison purposes, the function, the number of comparisons made back.

Cache (computing) Charles E. Leiserson Ron Rivest Donald Knuth

370427