sorting algorithm - encyclopedia article about sorting algorithm.

来源:百度文库 编辑:神马文学网 时间:2024/04/29 08:10:12

sorting algorithm

   Also found in: Dictionary/thesaurus, Computing 0.09 sec.Sponsored links Powerful Sorting for DevelopersAdd powerful sorting to your code. Sort terabytes of data at extreme speeds with Rapidsort. Custom comparison functions lets virtually any standard or user-defined data type to be sorted.www.codebase.comAlgorithm Books and MoreShop eBay for anything and everything - from specialized gifts to custom clothing. It‘s all on eBay.www.ebay.comBookstore Online, Amazon.comIf you love to read, Amazon is your Internet bookstore. Millions of books like "Sorting" at remarkable savings.www.amazon.comPage toolsPrinter friendly
Cite / link
Email
Feedback In computer science Computer science (abbreviated CS or compsci) encompasses a variety of topics that relates to computation, like abstract analysis of algorithms, formal grammars, and subjects such as programming languages, program design, software, computer hardware, artificial intelligence, and numerical analysis. By definition, computer science is the accumulated knowledge through scientific methodology by computation or by the use of the computer.
..... Click the link for more information.  and mathematics Mathematics is the study of quantity, structure, space, and change. Historically, mathematics developed from counting, calculation, measurement, and the study of the shapes and motions of physical objects, through the use of abstraction and deductive reasoning.

Mathematics is also used to refer to the knowledge gained by people by doing mathematics, also known as the body of mathematical knowledge. This latter meaning of mathematics includes the mathematics used to do calculations or models and is an indispensable tool in the natural sciences, engineering and economics.
..... Click the link for more information. , a sorting algorithm is an algorithm algorithm (the word is derived from the name of the Persian mathematician Al-Khwarizmi) is a finite set of well-defined instructions for accomplishing some task which, given an initial state, will terminate in a corresponding recognizable end-state (contrast with heuristic). Algorithms can be implemented by computer programs, although often in restricted forms; mistakes in implementation and limitations of the computer can prevent a computer program from correctly executing its intended algorithm.
..... Click the link for more information.  that puts elements of a list list is an abstract concept denoting an ordered collection of fixed-length entities. In practice, lists are usually implemented either as arrays or linked lists of some sort. The use of the concept allows to treat them regardless of implementation. For this reason, lists have properties that arrays and linked lists share.
..... Click the link for more information.  in a certain order In mathematics, a total order, linear order or simple order on a set X is any binary relation on X that is antisymmetric, transitive, and total. This means that, if we denote the relation by ≤, the following statements hold for all a, b and c in X:
..... Click the link for more information. . The most used orders are numerical order and lexicographical order In mathematics, the lexicographical order, or dictionary order, is a natural order structure of the cartesian product of two ordered sets. Given A and B, two ordered sets, the lexicographical order in the cartesian product A × B is defined as
(a,b) ≤ (a′,b′) if and only if a < a′, or a = a′ and bb′.

..... Click the link for more information. . Efficient sorting is important to optimizing the use of other algorithms (such as search In computer science, a search algorithm, broadly speaking, is an algorithm that takes a problem as input and returns a solution to the problem, usually after evaluating a number of possible solutions. Most of the algorithms studied by computer scientists that solve problems are kinds of search algorithms.
..... Click the link for more information.  and merge Merge algorithms are a family of algorithms that run sequentially over multiple sorted lists, typically producing more sorted lists as output. This is well-suited for machines with tape drives. Use has declined due to large random access memories, and many applications of merge algorithms have faster alternatives when you have a random-access memory that holds all your data.
..... Click the link for more information.  algorithms) that require sorted lists to work correctly; it is also often useful for canonicalizing In computing, a canonical order is the order of elements that obeys a certain set of rules or specifications. Sorting algorithms are often used to canonicalize data sets.

In XML Signature canonicalization is the process of converting XML content to a canonical form, to take into account changes that can invalidate a signature over that data (from JWSDP 1.
..... Click the link for more information.  data and for producing human-readable output.

Classification

Sorting algorithms used in computer science Computer science (abbreviated CS or compsci) encompasses a variety of topics that relates to computation, like abstract analysis of algorithms, formal grammars, and subjects such as programming languages, program design, software, computer hardware, artificial intelligence, and numerical analysis. By definition, computer science is the accumulated knowledge through scientific methodology by computation or by the use of the computer.
..... Click the link for more information.  are often classified by:
  • Computational complexity Complexity theory is part of the theory of computation dealing with the resources required during computation to solve a given problem. The most common resources are time (how many steps it takes to solve a problem) and space (how much memory it takes). Other resources can also be considered, such as how many parallel processors are needed to solve a problem in parallel.
    ..... Click the link for more information.  (worst, average and best The term best-case performance is used in computer science to describe the way an algorithm behaves under optimal conditions. For example, a simple linear search on an array has a worst case performance O(n) (for the case where the desired element is the last, so the algorithm has to check every element; see Big O notation), and average running time is O(n/2) (the average position of an element is the middle of the array), but in the best case the desired element is the first element in the array and the run time is O(1).
    ..... Click the link for more information.  behaviour) in terms of the size of the list (n). Typically, good behaviour is O The Big O notation is a mathematical notation used to describe the asymptotic behavior of functions. More precisely, it is used to describe an asymptotic upper bound for the magnitude of a function in terms of another, usually simpler, function.

    In mathematics, it is usually used to characterize the residual term of a truncated infinite series, especially an asymptotic series. In computer science, it is useful in the analysis of the complexity of algorithms.
    ..... Click the link for more information. (n log n) and bad behaviour is Ω(n2). Ideal behaviour for a sort is O The Big O notation is a mathematical notation used to describe the asymptotic behavior of functions. More precisely, it is used to describe an asymptotic upper bound for the magnitude of a function in terms of another, usually simpler, function.

    In mathematics, it is usually used to characterize the residual term of a truncated infinite series, especially an asymptotic series. In computer science, it is useful in the analysis of the complexity of algorithms.
    ..... Click the link for more information. (n). Sort algorithms which only use an abstract key comparison operation always need at least Ω(n log n) comparisons on average;
  • Memory usage (and use of other computer resources)
  • Stability: stable sorting algorithms maintain the relative order of records with equal keys (i.e. values). That is, a sorting algorithm is stable if whenever there are two records R and S with the same key and with R appearing before S in the original list, R will appear before S in the sorted list.
  • General method: insertion, exchange, selection, merging, etc. Exchange sorts include bubble sort and quicksort. Selection sorts include shaker sort and heapsort.


When equal elements are indistinguishable, such as with integers, stability is not an issue. However, assume that the following pairs of numbers are to be sorted by their first coordinate:

(4, 1)  (3, 1)  (3, 7)  (5, 6)


In this case, two different results are possible, one which maintains the relative order of records with equal keys, and one which does not:

(3, 1)  (3, 7)  (4, 1)  (5, 6)   (order maintained) (3, 7)  (3, 1)  (4, 1)  (5, 6)   (order changed)


Unstable sorting algorithms may change the relative order of records with equal keys, but stable sorting algorithms never do so. Unstable sorting algorithms can be specially implemented to be stable. One way of doing this is to artificially extend the key comparison, so that comparisons between two objects with otherwise equal keys are decided using the order of the entries in the original data order as a tie-breaker. Remembering this order, however, often involves an additional space penalty.

List of sorting algorithms

In this table, n is the number of records to be sorted and k is the number of distinct keys.

Stable

  • Bubble sort Bubble sort, also known as exchange sort, is a simple sorting algorithm. It works by repeatedly stepping through the list to be sorted, comparing two items at a time, swapping these two items if they are in the wrong order. The pass through the list is repeated until no swaps are needed, which means the list is sorted. The algorithm gets its name from the way smaller elements "bubble" to the top (i.e. head) of the list via the swaps.
    ..... Click the link for more information.  — O(n2)
  • Cocktail sort Cocktail sort, also known as bidirectional bubble sort, cocktail shaker sort, shaker sort, ripple sort, or shuttle sort, is a stable sorting algorithm that varies from bubble sort in that instead of repeatedly passing through the list from top to bottom, it passes alternately from top to bottom and then from bottom to top. It can achieve slightly better performance than a standard bubble sort.
    ..... Click the link for more information.  (bidirectional bubble sort) — O(n2)
  • Insertion sort Insertion sort is a simple sort algorithm in which the sorted array (or list) is built one entry at a time. It is much less efficient than the more advanced algorithms such as quicksort, heapsort, or merge sort, but it has various advantages:
    • Simple to implement
    • Efficient on (quite) small data sets
    • Efficient on data sets which are already substantially sorted
    • Stable (does not change the order of already ordered elements)
    • In-place
    • It is an online algorithm, in that it can sort a list as it receives it.

    ..... Click the link for more information.  — O(n2)
  • Bucket sort Bucket sort is a sort algorithm that works by partitioning an array into a finite number of buckets and then sorting each bucket. It is a generalization of pigeonhole sort. Bucket sort runs in linear time (Θ(n)) when input is drawn from a uniform distribution. Not being a comparison sort, it is not subject to the Ω(n log n) lower bound.
    ..... Click the link for more information.  — O(n); requires O(k) extra memory
  • Counting sort Counting sort is a sorting algorithm which (like bucket sort) takes advantage of knowing the range of the numbers in the array to be sorted (array A). It uses this range to create an array C of this length. Each index i in array C is then used to count how many elements in A that have a value less than i.
    ..... Click the link for more information.  — O(n+k); requires O(n+k) extra memory
  • Merge sort In computer science, merge sort or mergesort is a sort algorithm for rearranging lists (or any other data structure that can only be accessed sequentially, e.g. file streams) into a specified order. It is a particularly good example of the divide and conquer algorithmic paradigm.

    Algorithm

    Conceptually, merge sort works as follows :
    1. Divide the unsorted list into two sublists of about half the size
    2. Sort each of the two sublists
    3. Merge the two sorted sublists back into one sorted list.
    The algorithm was invented by John von Neumann in 1945.
    ..... Click the link for more information.  — O(n log n); requires O(n) extra memory
  • In-place merge sort In computer science, merge sort or mergesort is a sort algorithm for rearranging lists (or any other data structure that can only be accessed sequentially, e.g. file streams) into a specified order. It is a particularly good example of the divide and conquer algorithmic paradigm.

    Algorithm

    Conceptually, merge sort works as follows :
    1. Divide the unsorted list into two sublists of about half the size
    2. Sort each of the two sublists
    3. Merge the two sorted sublists back into one sorted list.
    The algorithm was invented by John von Neumann in 1945.
    ..... Click the link for more information.  — O(n2)
  • Binary tree sort — O(n log n); requires O(n) extra memory
  • Pigeonhole sort Pigeonhole sorting takes linear time (O(n)), which is the best possible performance for a sort algorithm since one must inevitably look at each of the elements being sorted at least once, regardless of the sorting algorithm. However, pigeonhole sorting is only practical if the objects you are sorting fall within (or can be mapped into) a range of possible values that is small compared to the size of the list to be sorted.
    ..... Click the link for more information.  — O(n+k); requires O(k) extra memory
  • Radix sort Radix sort is a fast stable sorting algorithm which can be used to sort items that are identified by unique keys. Every key is a string or number, and radix sort sorts these keys in some particular lexicographic-like order.

    The algorithm operates in O(n · k) time, where n is the number of items, and k is the average key length.
    ..... Click the link for more information.  — O(n·k); requires O(n) extra memory
  • Gnome sort Gnome sort is a sort algorithm which is similar to insertion sort except that moving an element to its proper place is accomplished by a series of swaps, as in bubble sort. The name comes from the supposed behavior of the Dutch garden gnome in sorting a line of flowerpots. It is conceptually simple, requiring no nested loops. The running time is O(n2), and in practice the algorithm has been reported to run slightly slower than bubble sort, although this depends on the details of the architecture and the implementation.

    Description

    Here is pseudocode for the sort:
    ..... Click the link for more information.  — O(n2)
  • Library sort Library sort, or gapped insertion sort is a sorting algorithm much like insertion sort but with gaps left in the sorted array to accelerate subsequent insertions.

    References

    Publication from Los Alamos National Laboratory describing library sort and analyzing its algorithmic complexity.
    ..... Click the link for more information.  — O(n log n) with high probability, requires (1+ε)n extra memory

Unstable

  • Selection sort Selection sort is a sort algorithm that works as follows:
    1. find the minimum value in the list
    2. swap it with the value in the first position
    3. sort the remainder of the list (excluding the first value)


    If you had to invent a sort algorithm on your own, you‘d probably write an algorithm similar to selection sort because it is probably the most intuitive sort algorithm to invent.
    ..... Click the link for more information.  — O(n2)
  • Shell sort Shell sort (or Shellsort) is one of the oldest sorting algorithms. It was invented in 1959 by Donald L. Shell [Sh]. It is fast, easy to understand and easy to implement. However, its complexity analysis is a little more sophisticated. It is easy to develop an intuitive sense of how this algorithm works, but often difficult to analyze its execution time.
    ..... Click the link for more information.  — O(n log n) if best current version used
  • Comb sort In computer science, comb sort is a relatively simplistic sort algorithm designed by Stephen Lacey and Richard Box, who described it in Byte Magazine in April 1991. Comb sort improves on bubble sort and rivals in speed more complex algorithms like Quicksort. The basic idea is to eliminate turtles, or small values near the end of the list, since in a bubble sort these slow the sorting down tremendously. (Rabbits, large values around the beginning of the list, do not pose a problem in bubble sort.)
    ..... Click the link for more information.  — O(n log n)
  • Heapsort Heapsort is one of the best general-purpose sort algorithms, and is part of the selection sort family. Although somewhat slower in practice on most machines than a good implementation of quicksort, it has the advantages of worst-case O(n log n) runtime and being an in-place algorithm.
    ..... Click the link for more information.  — O(n log n)
  • Smoothsort The smoothsort sorting algorithm is a variation of heapsort developed by Edsger Dijkstra in 1981. The advantage of smoothsort is that it works in O(n) time if the input is (almost) sorted. Due to the extra complexity it is rarely used.

    For implementations in real programming languages, see .


    ..... Click the link for more information.  — O(n log n)
  • Quicksort — O(n log n) expected time, O(n2) worst case; generally believed to be the fastest known sort for large, random lists
  • Introsort — O(n log n)
  • Patience sorting — O(n log n + k) worst case time, requires additional O(n + k) space, also finds the longest increasing subsequences

Impractical sort algorithms

  • Bogosort — O(n × n!) expected time, unbounded worst case.
  • Stupid sort — O(n3); recursive version requires O(n2) extra memory
  • Bead sort — O(n) or O(√n), but requires specialized hardware
  • Pancake sorting — O(n), but requires specialized hardware
  • Stooge sort — O(n2.71)

Summaries of the popular sorting algorithms

Bubble sort

Bubble sort is the most straightforward and simplistic method of sorting data that could actually be considered for real world use. The algorithm starts at the beginning of the data set. It compares the first two elements, and if the first is greater than the second, it swaps them. It continues doing this for each pair of adjacent elements to the end of the data set. It then starts again with the first two elements, repeating until no swaps have occurred on the last pass. This algorithm, however, is vastly inefficient, and is rarely used except in education (i.e., beginning programming classes). A slightly better variant is generally called cocktail sort, and works by inverting the ordering criteria and the pass direction on alternating passes.

Insertion sort

Insertion sort is similar to bubble sort, but is more efficient as it reduces element comparisons somewhat with each pass. An element is compared to all the prior elements until a lesser element is found. In other words, if an element contains a value less than all the previous elements, it compares the element to all the previous elements before going on to the next comparison. Although this algorithm is more efficient than the Bubble sort, it is still inefficient compared to many other sort algorithms since it, and bubble sort, move elements only one position at a time. However, insertion sort is a good choice for small lists (about 30 elements or fewer), and for nearly-sorted lists. These observations can be combined to create a variant of insertion sort which works efficiently for larger lists. This variant is called shell sort (see below).

Shell sort

Shell sort was invented by Donald Shell in 1959. It improves upon bubble sort and insertion sort by moving out of order elements more than one position at a time. One implementation can be described as arranging the data sequence in a two-dimensional array (in reality, the array is an appropriately indexed one dimensional array) and then sorting the columns of the array using the Insertion sort method. Although this method is inefficient for large data sets, it is one of the fastest algorithms for sorting small numbers of elements (sets with less than 1000 or so elements). Another advantage of this algorithm is that it requires relatively small amounts of memory.

See in-place algorithm for a list of sorting algorithms that can be written to work in-place.

Merge sort

Merge sort takes advantage of the ease of merging already sorted lists into a new sorted list. It starts by comparing every two elements (i.e. 1 with 2, then 3 with 4...) and swapping them if the first should come after the second. It then merges each of the resulting lists of two into lists of four, then merges those lists of four, and so on; until at last two lists are merged into the final sorted list.

Heapsort

Heapsort is a member of the family of selection sorts. This family of algorithms works by determining the largest (or smallest) element of the list, placing that at the end (or beginning) of the list, then continuing with the rest of the list. Straight selection sort runs in O(n2) time, but Heapsort accomplishes its task efficiently by using a data structure called a heap, which is a binary tree where each parent is larger than either of its children. Once the data list has been made into a heap, the root node is guaranteed to be the largest element. It is removed and placed at the end of the list, then the remaining list is "heapified" again.

Quicksort

Quicksort is a divide and conquer algorithm which takes an element from an array and places it in its final position whilst at the same time partitioning the array so that all elements above the partition element are larger and all elements below are smaller. It then sorts each sub-array (partition) recursively. The sort is usually implemented by scanning up from the bottom of an array until an element larger than the partition element is found, then scanning down from the top for an element smaller than the partition element, these two elements are then swapped. When the scans from bottom and top meet the partition element is swapped into its final position. Quicksort is unstable, complex, and if implemented naively has poor worst case performance, but it has many advantages which make it the most frequently used sorting algorithm especially for standard libraries (C++‘s standard sort is even called "qsort"). Quicksort is an in place sort so it has modest memory requirements and does not involve copying, if implemented carefully bad worst case performance is extremely unlikely, and quicksort is probably the fastest sorting method. Quicksort runs in O(n log n) time.

Radix sort

Some radix sort algorithms are counterintuitive, but they can be surprisingly efficient. If we take the list to be sorted as a list of binary strings, we can sort them on the least significant bit, preserving their relative order. This "bitwise" sort must be stable, otherwise the algorithm will not work. Then we sort them on the next bit, and so on from right to left, and the list will end up sorted. This is most efficient on a binary computer (which nearly all computers are). If we had a ternary computer, we would regard the list as a list of base 3 strings and proceed in the same way. Most often, the bucket sort algorithm is used to accomplish the bitwise sorting.

Radix sort can also be accomplished from left to right, but this makes the algorithm recursive. On a binary (radix 2) computer, we would have to sort on the leftmost bit, and then sort the sublist with 0 as the leftmost bit, and then sort the sublist with 1 as the leftmost bit, and so on.

Memory Usage Patterns and Index Sorting

When the size of the array to be sorted approaches or exceeds the available primary memory, so that (much slower) disk or swap space must be employed, the memory usage pattern of a sorting algorithm becomes important, and an algorithm that might have been fairly efficient when the array fit easily in RAM may become impractical. In this scenario, the total number of comparisons becomes (relatively) less important, and the number of times sections of memory must be copied or swapped to and from the disk can dominate the performance characteristics of an algorithm. Thus, the number of passes and the localization of comparisons can be more important than the raw number of comparisons, since comparisons of nearby elements to one another happen at system BUS speed (or, with cacheing, even at CPU speed), which, compared to disk speed, is virtually instantaneous.

For example, the popular recursive quicksort algorithm provides quite reasonable performance with adequate RAM, but due to the recursive way that it copies portions of the array it becomes much less practical when the array does not fit in RAM, because it may cause a number of slow copy or move operations to and from disk. In that scenario, another algorithm may be preferable even if it requires more total comparisons.

One way to work around this problem, which works well when complex records (such as in a relational database) are being sorted by a relatively small key field, is to create an index into the array and then sort the index, rather than the entire array. (A sorted version of the entire array can then be produced with one pass, reading from the index, but often even that is unnecessary, as having the sorted index is adequate.) Because the index is much smaller than the entire array, it may fit easily in memory where the entire array would not, effectively eliminating the disk-swapping problem.

Another technique for overcoming the memory-size problem is to combine two algorithms in a way that takes advantages of the strength of each to improve overall performance. For instance, the array might be subdivided into chunks of a size that will fit easily in RAM (say, a few thousand elements), the chunks sorted using an efficient algorithm (such as quicksort or heapsort), and the results merged as per mergesort. This is more efficient than just doing mergesort in the first place, but it requires less physical RAM (to be practical) than a full quicksort on the whole array.

Techniques can also be combined. For sorting really enormous amounts of data that completely dwarf system memory, even the index may need to be sorted using an algorithm or combination of algorithms designed to perform reasonably with virtual memory, i.e., to reduce the amount of swapping required.

Graphical representations

Microsoft‘s "Quick" programming languages (such as QuickBASIC and QuickPascal) have a file named "sortdemo" (with extension BAS and PAS for QB and QP, respectively) in the examples folder that provides a graphical representation of several of the various sort procedures described here, as well as performance ratings of each.

Also, a program by Robb Cutler called The Sorter for classic Mac OS performs a similar function. It illustrates Quick sort, Merge sort, Heap sort, Shell sort, Insertion sort, Bubble sort, Shaker sort, Bin sort, and Selection sort.

Finally, at the University of British Columbia, Jason Harrison has a page graphically demonstrating the activity of various in-place sorts.

See also

  • Big O notation
  • External sorting
  • Sorting networks (compare)
  • Collation
  • Schwartzian Transform

External links and references

  • D. E. Knuth, The Art of Computer Programming, Volume 3: Sorting and Searching.
  • http://www.iti.fh-flensburg.de/lang/algorithmen/sortieren/algoen.htm has explanations and analyses of many of these algorithms.
  • http://www.aeriesoft.ru/Projects/SortAlg/ has information on many of these algorithms.
  • Ricardo Baeza-Yates‘ sorting algorithms on the Web
  • ‘Dictionary of Algorithms, Data Structures, and Problems‘
  • For some slides and PDFs see Manchester university‘s course notes
  • For a repository of algorithms with source code and lectures, see The Stony Brook Algorithm Repository
  • Graphical Java illustrations of the Bubble sort, Insertion sort, Quicksort, and Selection sort
  • xSortLab - An interactive Java demonstration of Bubble, Insertion, Quick, Select and Merge sorts, which displays the data as a bar graph with commentary on the workings of the algorithm printed below the graph.
  • Sorting Algorithms Demo - Java applets that chart the progress of several common sorting algorithms while sorting an array of data using in-place algorithms.
  • http://www.iti.fh-flensburg.de/lang/algorithmen/sortieren/sortcontest/sortcontest.htm - An applet visually demonstrating a contest between a number of different sorting algorithms
  • sortchk - a sort algorithm test suite released under the terms of the BSD License (original)
  • The Three Dimensional Bubble Sort- A method of sorting in three or more dimensions (of questionable merit)
preview not available. Click the link for more information.
This article is copied from an article on Wikipedia.org - the free encyclopedia created and edited by online user community. The text was not checked or edited by anyone on our staff. Although the vast majority of the wikipedia encyclopedia articles provide accurate and timely information please do not assume the accuracy of any particular article. This article is distributed under the terms of GNU Free Documentation License.