James Cook University - Department of Computer Science

CP2001 Data Structures and Algorithms

Sorting Algorithms

next
Next

Some Efficent Sorting Algorithms

Chapt 18

The main problem with simple sorting algorithms covered in CP1300 (such as bubble sort, insertion sort and selection sort) is their poor run-time performance -> requiring average-case O(N^2) comparisons.

The O(N^2) may not be a problem for small N (indeed, insertion sort is the best technique to use for N<20).

We consider some sorting algorithms that have a run-time performance better than O(N^2).

One of these we have already seen previously - heap sort, which achieves a guaranteed worst-case O(N log N) run-time and a not-too-bad average run-time.

In fact, heap sort achieves a much better spread of running times (ie, between the worst-case and best-case behaviour) than quick sort.
However, the average-case performance of quick sort is about twice that of heap sort (but still O(N log N)).

We now briefly look at a few other sorting algorithms:

Tree Sort

Tree sort sorts by inserting the unsorted values into a balanced binary tree (say, an AVL-tree) and then doing an inorder traversal of the tree. The inorder traversal visits the nodes in sorted order.

An example simple algorithm:

    tree = AVL_tree_create()          // O(1) cost 
    for i = 1 to N do                 // O(N log N) cost 
        AVL_insert(tree, A[i])          
    AVL_traverse_inorder(tree)        // O(N) cost 

The overall cost is dominated by the time to build the tree of N items by repeated insertion into the tree (and local rotations for an AVL tree, or colour flips for a red-black tree).
That is, the overall cost is O(N log N).

Merge Sort

Merge sort is a divide and conquer sorting technique.

The idea behind a divide and conquer technique is to have two phases to sorting:

  1. the divide phase, and

  2. the conquer phase.

The divide phase repeatedly divides the problem into smaller sub-problems until the problem is small enough to solve.

The conquer step solves the simpler sub-problems and reconstructs a solution to the overall problem.

As you can imagine, divide and conquer techniques are typically recursive in nature.

In merge sort we assume we know:

The idea behind merge sort to to divide the problem in half, recursively, until there is one item to sort, then merge the sorted items when returning from the recursion.

Merge sort uses the following recursive procedure:

   mergeSort(List)
   {
       if (List is a single item) return;  //go back up recursion tree
       SplitList(List, unsortedListL, unsortedListR);
       A = mergeSort(unsortedListL);
       B = mergeSort(unsortedListR);
       return mergeLists(A,B);
   }
where SplitList() splits the unsorted list, List, into two equal sub-lists (unsortedListL and unsortedListR), and where mergeLists merges the two sorted sub-lists A and B.

Due to the tree-like recursive nature of divide-and-conquer, the overall run-time complexity is O(N log N).
This is due to the O(N) comparisons required at each level of the recursion tree and the existence of O(log N) levels in the tree.

Note that merge sort guarantees O(N log N) worst-case behaviour - regardless of the distribution of the original data (eg, random, partially sorted, or already sorted).

In fact, merge sort has one the best worst-case behaviours of all sorting algorithms (for moderate-to-large N).

Disadvantages:

Quick Sort

Quick sort is another recursive divide-and-conquer technique.

In each round of quick sort:

  1. choose a pivot amongst the unsorted values (say, the first value)

  2. move all values less than the pivot before the pivot

  3. move all values greater than the pivot after the pivot

  4. pivot is now in correct place in sorted order

  5. quicksort the values before the pivot

  6. quicksort the values after the pivot

The first steps involving locating the pivot in its correct (final and sorted) position requires O(N) comparisons.

The last two steps perform the recursion (quick sort on the two sub-arrays of values - one having values less than the pivot, the other having values greater than the pivot).
This, as in the case of merge sort, consists of O(log N) recursions.

So, the overall run-time complexity is O(N log N).

The average-case and best-case performances are both O(N log N).
In fact, quick sort has the best average-case behaviour of all sorting algorithms.

Unfortunately, if the array of values is sorted (or nearly sorted), the performance may go down to O(N^2) - due to the continuous scan of the pivot all the way to the other side of the array when the array is pre-sorted.

Therefore quick sort cannot guarantee a O(N log N) performance for all array types.

Also, like merge sort, quick sort uses the stack => stack overheads.


Copyright © 1998 Olivier de Vel. All rights reserved.
next
Next