Flashsort

Flashsort is a sorting method on the distribution (English distribution sorting) based. It has a linear complexity for uniformly distributed input data.

Concept

The idea behind this is that a dataset is given with a known probability distribution, which makes it easier to directly estimate, whereas an element in the sort should be positioned approximately where the lower and upper bound of the range are known. For example, if a uniformly distributed set of numbers is given, whose minimum 1 and its maximum is 100, then it is a good assumption that an element with the value 50 is about to come to rest in the middle of the sorted set. This approximate position is called in this class sorting methods. When the elements are numbered up, then the class of the item is calculated as follows:

The unsorted input amount., The area covered by each class is (about) the same size, except the last class, which contains only the maximum or maxima. This classification ensures that each item in a particular class is greater than any element in a lower class. This provides a partial order of the data and reduces the number of inversions. Insertion sort (sorting by insertion ) is applied to the classified amount. As long as the input data uniformly (English Uniformly ) are distributed, these classes are approximately equal, ie contain only a few elements, so that the order within the classes can be performed efficiently.

Main memory efficient implementations

To run Flashsort with low memory requirements, could you give up to use your own data structures for the classes. Instead, the upper limit can be placed in each class of the input data list in an auxiliary vector. These upper limits can be determined by one in a cycle is calculated by the number of all data elements in each class. The upper limit of a class is then the upper limit of the previous class plus the number of elements in the class itself, the elements of this vector can be used as a pointer to the class.

The distribution of the classes is implemented through a series of cycles, where element is taken on the input vector and its class is calculated. The pointer in the vector can be used to insert this element in a free position within the appropriate class. After inserting this pointer is decremented. The insertion takes place in the vector itself and displaces another element that will be inserted next to the appropriate position. This cycle ends when an element is inserted into the position from which came the first element of the cycle.

An element is a valid start element for one cycle when it has not yet been classified. While the algorithm on the input vector iterates, the already classified and unclassified elements are omitted elements are respectively used to start a new cycle. It is no more support structures for individual elements dei decide if a particular item has already been classified as follows: An element is then classified exactly when its index is greater than the pointer in the sub- vector. To demonstrate this, consider the instantaneous index of the vector, which the algorithm being processed. Be this index. The elements to have already been classified and been inserted into the correct class. Assumed is greater than the pointer to the class of items. It is now assumed that unklassifziert and thus can be inserted at the index that is specified by its class pointer legitimately where it would replace an element classified in another class. But this is impossible since the initial pointer of each class map directly to their upper limits, which ensures that the exact space required for each class of Eingebevektors is provided. That is why every member of the class of the element, including the element itself, has already been classified. So if an item has already been classified, then the pointer to this class has been reduced, and therefore below the new index of the element.

Performance / complexity

Performance Considerations (English performance) include the main memory requirements and the time required for sorting. The only additional required memory is required for the auxiliary vector, in order to store the boundaries of classes, and a constant number of scalar variables needed during computation.

Ideally, a balanced input data set, each class has approximately the same size, and sorting of the individual classes, the complexity. If the number of classes proportional to the size of the input data set, the running time is calculated from the order within the classes, that is. In the worst case, if all the elements end up in one or very few classes, the complexity of the algorithm is dominated as a whole by the performance of the insertion site within the classes. In this case, one obtains. Variants of the algorithm to improve the performance in this case, by using better algorithms for sorting within the classes, eg Quicksort or recursive Flashsort on the classes that exceed a certain size.

For the correct choice of, the number of classes, the time for classifying the elements ( little low ) and the time required for sorting within the classes must ( great cheap), will cancel each other out. Based on his research results found Neubert, that is optimal.

With respect to the main memory avoids Flashsort the effort that is required to store the classes in the very similar bucket sort. For data distributed uniformly is Flashsort for all faster than Heapsort and faster than Quicksort. Approximately when it is twice as fast as quicksort.

Because the classification process is not stable Flashsort.

Demarcation

The presented Neubert as Flashsort algorithm has the same name as an older randomized sorting algorithm for parallel computer architectures.

Bucketsort Digital Object Identifier

337468