Shellsort

Shellsort is a system developed by Donald L. Shell in 1959 sorting method based on the sorting method of direct insertion ( insertion site ).

Principle

Shellsort uses in principle the site of insertion. It is trying to compensate for the disadvantage that the elements in the sequence need to be moved over long distances often. This makes insertion site inefficient. Shellsort 's approach is that the sequence is first decomposed into individual sub-sequences and they are sorted. The allocation is made at each step in a different number. For the division of the elements are not copied, but the elements have a certain constant distance from each other. For example, a factor of 4 is divided into four sub-sequences, the elements of which are formed from the original sequence by spacer 4, that indices 0, 4, 8, ... constituting a sub-sequence, indices 1, 5, 9, ... to another. If the sorting step completed, it is called the sequence then 4- sorted. The entire Shellsort, for example, leads then to a sequence which is only 4- sorted, then 2- sorted, and finally with a normal insertion site, so to speak 1- sorted.

Clearly this would be based on Hilfsmatrizen display (see example):

The data is written row-wise into a k -column matrix
The columns of the matrix are sorted individually

This results in a rough sort. This step is repeated several times, each the width of the matrix is reduced to the matrix consists only of a single fully sorted column.

Important: a * b sorted sequence is not automatically a- or b- sorted sorted! For the proof we consider a sequence of the numbers 1 to 12, this is 6- sorted when we leave on an arbitrary permutation of the numbers 1 .. 6 a also arbitrary permutation of the numbers 7 .. 12 follow. The permutation 6, 5, 4, 3, 2, 1 is by no means 2 - or 3- sorted.

6, 5, 4, 3, 2, 1, 7, 8, 9, 10, 11, 12 is 6- sorted, but not the 2 - and 3- not sorted. Shellsort works in-place, but not one of the stable sorting algorithms. Because of the sort distance the sorting method loses its property "stable". Two adjacent and assorted elements end up in different sub- sequences and may be rearranged so that their order is reversed.

Example

Sorting the numbers 2, 5, 3, 4, 3, 9, 3, 2, 5, 4, 1, 3 by means of the sequence

First, the data is in a matrix with four columns and added columns order. The number sequence is thus 4- sorted.

2 5 3 4 2 4 1 2 3 9 3 2 → 3 3 3 5 5 4 1 3 5 9 3 4 The assorted four- column matrix is now split into two columns, where read from left to right. These columns are now 2- sorted.

2 4 1 2 1 2 2 3 3 5 → 3 4 3 3 3 4 5 9 3 5 3 4 5 9 The sorted two-column matrix is now written in a column and then sorted by normal insertion site. The advantage of this is that no member of the sequence must be moved as far as the insertion site, is applied to a non -sorted order.

1 2 2 3 3 4 3 4 3 5 5 9 → 1 2 2 3 3 3 3 4 4 5 5 9 The sequence of steps used here (as it was proposed in 1959 by original shell) proves in practice, however, as not appropriate, since only just are sorted places and odd points of the sequence are handled only in the last step. As expediently has 1,4,13,40, ... proven ( Valuen = 3 × Valuen -1 1).

Implementation

Java 5

Shellsort is an improvement of the algorithm straight insertion. There namely wander the elements in individual steps in its place: After finding the smallest element intervening be individually pushed up and only the smallest "jump". Most (ie, n ) elements will be moved from their original place in an average of n / 3 steps to their final place.

When Shell -Sort is carried decreasing step sizes k, k, ... k [ t], where the last increment always k [ t] = 1. There are sequentially performed steps t; jump in the m-th step, the elements in the direction of their future place for each k [ m] Set. In the first step, those items are sorted among themselves, the k points are from each other; then those who are a distance k of each other have, etc. The effect of this approach is that the elements in the first round "jump" not one, but by k positions to her place.

The last increment k [ t] is 1, that is Finally a normal straight insertion sort is performed. This guarantees that at the end of the ranking is sorted. The algorithm, however, needs hardly to do something, because the previous steps have been almost completely sorted the ranking.

By suitable choice of the increments k, k, ... k [ T ], the sorting overhead can be reduced significantly. For the step sizes (1, 3, 7, 15, 31, ...) has been demonstrated (see Donald E. Knuth: The Art of Computer Programming ), that the time complexity of the algorithm n1 is 5, which is significantly better than the quadratic complexity of bubblesort, insertion sort or selection sort, but (at least for very large data sets) worse than the complexity of n log n mergesort or heapsort.

Static > void Shellsort (E [ ] collection, int [ ] strode wide ) { for (int step size: wide step ) { / / straight insertion with increMent for (int index = increMent; index < sammlung.length, index ) { E elementZumEinfuegen = collection [index ]; insertion int = index; while ( insertion point - step width > = 0 && elementZumEinfuegen.compareTo ( collection [ insertion - step] ) < 0) { collection [ insert position ] = collection [ insert position - step]; insertion - = step size; / / Jump to increMent } collection [ insert position ] = elementZumEinfuegen; } } } Java

Actually the data is not arranged in the form of a matrix but in the form of a one-dimensional array. The columns are formed by clever indexing. So make all the elements in a column spacing h. The columns are sorted by insertion sort, since this algorithm can benefit from a pre-sorting of the data.

In the following program, the data are first arranged in h = 2147483647 columns, if enough data are available. If not, the for- loop i is skipped and proceeds to h = 1131376761 etc.

Void shellsort (int [] a, int n ) { int i, j, k, h, t; int [ ] columns = { 2147483647, 1131376761, 410151271, 157840433, 58548857, 21521774, 8810089, 3501671, 1355339, 543 749, 213331, 84801, 27901, 11969, 4711, 1968, 815, 271, 111, 41, 13, 4, 1 }; for ( k = 0, k <23, k ) { h = split [k ]; / / Sort the " columns" with insertion sort for (i = h, i < n; i ) { t = A [ i]; j = i; while ( j > = h && a [j -h] > t) { a [j ] = a [ j -h]; j = j- H; } a [j ] = t; } } } Complexity and distance effects

The complexity of Shellsort depends on the choice of the spacer sequence for the number of columns h. Upper limits of complexity have been demonstrated for various sequences that give so a clue for the duration. Most theoretical work on the consequences consider only the number of comparisons as a significant cost factor. However, in real implementations, shows that the grinding and copying actions play a crucial role in not huge arrays.

Originally proposed Donald shell before the sequence 1, 2, 4, 8, 16, 32, ..., 2k. The performance is, however, very bad, because at the very last step, the elements are sorted in odd positions. The complexity is very high.

To the sequence 1, 3, 7, 15, 31, 63, ..., 2K - 1 Hibbard complexity of is achieved.

To the sequence 1, 2, 3, 4, 6, 8, 9, 12, 16, ..., is the complexity 2p3q Pratt.

Donald Knuth has worked out some consequences for Shellsort. A is commonly used in the literature as follows: 1, 4, 13, 40, 121, 364, 1093, ..., (3k -1) / 2 Better known is the calculation rule the same sequence: 3HK -1 1, the complexity is.

Some good consequences come from Robert Sedgewick. Sequence 1, 8, 23, 77, 281, 1073, 4193, 16577, ..., 4k 1 3 * 2k 1 has reached complexity. A much better result is the following: 1, 5, 19, 41, 109, 209, 505, 929, 2161, 3905, 8929, 16001, ..., 9 * 2k - 9 * 2k / 2 1 (k even) 8 and 2k * - 6 * 2 (k 1) / 2 1 ( k is odd)

Considering purely geometrical consequences, there is a minimum in the larger environment of a factor of 2.3, ie the followers have the ratio of about 2.3. One of the theoretically best sequences ( ie the number of comparisons), which was determined experimentally by Marcin Ciura is 1, 4, 10, 23, 57, 132, 301, 701, 1750, based on this factor. Based on the factor 1750/701, the series continues as follows: Let g be the last link, then the next by 1 floor ( 2.5 * g ) is given, ie 701, 1753, 4383, 10958, 27396, 68491, 171228, ... A series of Gonnet and Baeza -Yates is based on a factor of 2.2, which gives very good results.

Amazingly, in practice better than the consequences of Marcin Ciura known, the calculated recursively. The term of the Shellsort is shorter, although the number of comparisons is higher ( to be sorted elements are integers in register width ). Recursive sequences are calculated from integers and the ratio of the followers converge to a certain value at the Fibonacci numbers is the golden ratio.

Such a sequence is based on the Fibonacci numbers. One of the two 1er at the beginning is omitted, and each number in the sequence with twice the golden ratio ( approximately 3.236 ) potentiated, which then leads to this distance, a row: 1, 9, 34, 182, 836, 4025, 19001, 90358, 428 481, 2034035, ...

Another recursive sequence was found by Matthias Fuchs. The sequence 1, 4, 13, 40, 124, 385, 1195, 3709, 11512, 35731, ... has a convergence value of approximately 3.103803402. The calculation rule is fk 1 = fk 3 * fk -2, the sequence initially starts with 1, 1, 1, and for the Shellsort the first two 1er be omitted.

Other consequences are not constant, but are calculated from the current number of elements in the array. Initializes they are with that number, and sink until they finally arrived at are 1. Robert Kruse: hk- 1 = hk / 3 1 Gonnet and Baeza -Yates: hk- 1 = (5 * hk - 1) / 11 Both episodes have a slightly worse performance than the two recursive sequences and the very good results of Sedgwick and Marcin Ciura. But they are integrated directly into the Shellsort algorithm.

The existence of a sequence with the complexity has already been ruled out in a paper by Bjorn Poonen, Plaxton and Suel. But it was proven that, in principle, for sufficiently large n, a sequence with a complexity of can always be found.

The search for an optimal sequence designed to be extremely difficult. At large distances between the sequence elements arise to large displacements to close distances cause too many passes to final sorting. It applies in the choice of a sequence to avoid that two consecutive terms of the sequence have common divisor, since a * b sorted sequence and a subsequent a * c- sorting certain subsequences of the sort excludes (see Note to the original sequence 1, 2, 4, 8, 16, ..., which omits the odd and considered until the 1- sorting). Over several members away which is quite beneficial.

A significant advantage of shellsort algorithm compared with other is that it can take advantage of already existing sorting. It only plays a minor role, if the array is sorted or inversely sorted exists. Both cases are by factors faster than a purely randomly sorted array. With only 65536 is the speed advantage about a factor of 4, at least 128 more than a factor of 2

Variations

Combsort works similar Shellsort. The column sort order with only one run of Bubblesort before the number of columns is reduced.

Robert Sedgewick (computer scientist)

726751