常见排序算法导读(8)[堆排序]

2019年1月26日 206次阅读来源: veli

堆排序(Heap Sort)之所以让人魂牵梦萦，是因为其实现过程比较复杂，不但包括堆的构造，还包括堆的析构。无论是堆的构造（初始化）还是析构，都离不开堆的调整。而调整堆的方法又分为两种，一种是向上游(Swim)，另一种就是往下沉(Sink)。向上游(Swim)方法通常用在堆的构造过程中，而往下沉(Sink)通常用在堆的析构过程中。在正式介绍堆排序之前，有必要先讲讲堆的基本概念。

P.S.在前段时间找工作的过程中，Intel OTC的一个兄弟就问过我堆排序。可惜我当时对堆排序没有很深入的理解，因为没有很好地把握堆的调整过程。直到我后来看了大师Robert Sedgewick和Kevin Wayne合著的《算法》第4版第2章(2.4 优先级队列)，才恍然大悟！大师们用的是向上游(Bottom-up reheapify (swim))和往下沉(Top-down reheapify (sink))来描述堆的调整过程，可谓形象生动至极！相比之下，在某些国产数据结构教科书中，使用FilterUp()和FilterDown()函数来描述和实现堆的向上调整与向下调整就显得晦涩得多。为了表达对大师们的顶礼膜拜之情，我这周果断地购买了《算法》第4版的英文版。所以， “既要好读书，更要读好书！“

关于堆的定义，维基百科是这么说滴，

A heap is a specialized tree-based data structure that satisfies the heap 
property: If A is a parent node of B, then the key (the value) of node A is 
ordered with respect to the key of node B with the same ordering applying 
across the heap. 

A heap can be classified further as either a "max heap" or a "min heap". 

In a max heap, the keys of parent nodes are always greater than 
or equal to those of the children and the highest key is in the root node. 

In a min heap, the keys of parent nodes are less than
or equal to those of the children and the lowest key is in the root node.

什么是堆(Heap)？

堆本质上是一棵二叉树，而且是完全二叉树。 (注：从严格意义上讲，堆可以是N(>=2)叉树，为简单起见，这里只讨论堆为二叉树的情况。)

堆的分类

堆分为大顶堆(Max Heap)和小顶堆(Min Heap)。

什么是大顶堆(Max Heap)？

大顶堆中的每个结点的关键字都不小于孩子结点(如果有的话)的关键字。例如，（图片来源戳这里）

《常见排序算法导读(8)[堆排序]》

什么是小顶堆(Min Heap)？

小顶堆中的每个结点的关键字都不大于孩子结点(如果有的话)的关键字。例如，(感谢贤妻将上面的Max Heap图PS成Min Heap图，背景透明，高大上！)

《常见排序算法导读(8)[堆排序]》

堆到底有什么用？

堆(Heap)能很好地支持优先级队列(Priority Queue)的基本操作。而优先级队列的应用是非常广泛的，例如：

操作系统的调度器实现
网络传输中的路由选择
… …
针对急诊大厅里的病号们而设计的就诊调度程序(试想想，如果没有实现优先级队列，那该何等可怕？！)

《常见排序算法导读(8)[堆排序]》

那么，什么是优先级队列(Priority Queue)？

优先级队列毫无疑问也是队列，表现形式为FIFO(Fisrt In, First Out)线性结构。但与普通队列不同的是，优先级队列的删除操作比较特殊，总是以队列元素的优先级高低为准，比如总是删除优先级最高的元素或者总是删除优先级最低的元素。而优先级队列的插入操作跟普通队列的插入操作一样，也就是不限制元素的优先级，任何时刻任何优先级的元素都可以入队。

在熟悉了堆的基本概念后，下面以大顶堆(Max Heap)为例讲述堆排序(Heap Sort)。

堆排序(Heap Sort)的基本思想

堆排序(Heap Sort)是一种树形选择排序。

堆的构造：根据初始数组(a[0..n-1])构造一个完全二叉树，保证所有的父结点的关键字都>=它的孩子结点的关键字。
堆的析构：交换最后一个叶子结点(a[n-1])和树根结点(a[0])(关键字最大)，输出结点(a[n-1])，然后把余下的整棵树(a[0..n-2])重新调整为大顶堆。

将析构过程循环往复，当输出完最后一个结点后，这个数组(a[0..n-1])的关键字就满足从小到大的排列顺序了。

典型的堆排序看起来是这样子滴，(图片来源戳这里)

《常见排序算法导读(8)[堆排序]》

文章Heap Data Structures对大顶堆的排序过程有非常好的论述，摘要如下：

1. 大顶堆的构造算法(Max Heap Construction Algorithm) // Swim

Step 1: Create a new node at the end of heap.
Step 2: Assign new value to the node.
Step 3: Compare the value of this child node with its parent.
Step 4: If value of parent is less than child, then swap them.
Step 5: Repeat step 3 & 4 until Heap property holds.

《常见排序算法导读(8)[堆排序]》

2. 大顶堆的删除（析构）算法(Max Heap Deletion Algorithm) // Sink

Step 1: Remove root node.
Step 2: Move the last element of last level to root.
Step 3: Compare the value of this child node with its parent.
Step 4: If value of parent is less than child, then swap them.
Step 5: Repeat step 3 & 4 until Heap property holds.

《常见排序算法导读(8)[堆排序]》

上面的动画我个人非常喜欢，因为对于理解堆排序实在是太贴心了（不得不承认，老外写的文章就是可读性好，最大的特点就是将复杂的东西尽可能地简单化和形象化，很值得我们学习和借鉴）。好了，该上具体的代码实现啦！ (为方便理解，下面给出略微冗长(i.e.不是非常精简)但是可读性良好的C代码实现)

1. 数组与堆的映射表示

/*
 * Assume the size of an array is 9, e.g.
 *
 * Array: a[0] a[1] a[2] a[3] a[4] a[5] a[6] a[7] a[8] // n = 9
 *
 * Heap :        a[0]               --- Level 0 ---
 *                / \
 *               /   \
 *              /     \
 *            a[1]    a[2]          --- Level 1 ---
 *            / \      / \
 *           /   \    /   \
 *         a[3] a[4] a[5] a[6]      --- Level 2 ---
 *         /  \
 *       a[7] a[8]                  --- Level 3 ---
 *
 * For node a[i],  i = 0, 1, 2, ..., N-1
 *
 *       Index of its Left  Child = (i + 1) * 2 - 1 = i * 2 + 1
 *       Index of its Right Child = (i + 1) * 2     = i * 2 + 2
 *       Index of its Parent      = (i + 1) / 2 - 1 = (i - 1) / 2
 *
 * C code:
 *
 *       int getIndexOfLeftChild(i)  { return i * 2 + 1;         }
 *       int getIndexOfRightChild(i) { return i * 2 + 2;         }
 *       int getIndexParent(i)       { return (i - 1) / 2;       }
 *       int hasLeftChild(n, i)      { return ((i * 2 + 1) < n); }
 *       int hasRightChild(n, i)     { return ((i * 2 + 2) < n); }
 *       int hasParent(i)            { return (i > 0);           }
 *       int isLeafNode(n, i)        { return (i >= n / 2);      }
 */

2. 第1个版本的heapsort()函数 [只使用swim(),未使用sink()]

 1 static void exchange(int a[], int i, int j)
 2 {
 3         int t = a[i];
 4         a[i] = a[j];
 5         a[j] = t;
 6 }
 7 
 8 static int getIndexOfParent(int i)
 9 {
10         return (i - 1) / 2;
11 }
12 
13 static void swim(int a[], size_t n, int k)
14 {
15 #define ISNOTROOT(k)    (k != 0)
16         while (ISNOTROOT(k)) {
17                 int parent = getIndexOfParent(k);
18                 if (a[k] <= a[parent])
19                         break;
20 
21                 /* swim only if a[k] > a[parent] */
22                 exchange(a, k, parent);
23                 k = parent;
24         }
25 }
26 
27 static void constructMaxHeap(int a[], size_t n)
28 {
29         for (int i = 0; i < n; i++)
30                 swim(a, i, i);
31 }
32 
33 void heapsort(int a[], size_t n)
34 {
35         constructMaxHeap(a, n);         // firstly construct a max heap
36                                         //
37         while (n > 0) {                 // note a[0] is always the max element
38                 exchange(a, 0, n - 1);  // swap a[0], a[n-1]   then
39                                         //      a[n-1] is      the max one
40                 n--;                    // decrease the size of a[]
41                 constructMaxHeap(a, n); // re-construct the max heap by walking
42                                         //      a[0 .. n-2]
43         }                               //
44 }

说明： 对于L13-L25的swim()函数，a[0 .. k-1]总是满足大顶堆的结构，那么将a[k]插入大顶堆的时候，采用swim（向上游）的方法。如果a[k]比它的parent的关键字要大，就向上游一把(即交换a[k]和它的parent)，如此循环往复，直到a[0 .. k]满足大顶堆结构为止。

如果输入的数组为： int a[] = {0, 1, 2, 3}, 则构造堆的过程可用图推演如下：

《常见排序算法导读(8)[堆排序]》

3. 第2个版本的heapsort()函数 [既使用swim()也使用sink()]

 1 static void exchange(int a[], int i, int j)
 2 {
 3         int t = a[i];
 4         a[i] = a[j];
 5         a[j] = t;
 6 }
 7 
 8 static int getIndexOfParent(int i)
 9 {
10         return (i - 1) / 2;
11 }
12 
13 static int getIndexOfLeftChild(int i)
14 {
15         return i * 2 + 1;
16 }
17 
18 static int getIndexOfRightChild(int i)
19 {
20         return i * 2 + 2;
21 }
22 
23 static void swim(int a[], size_t n, int k)
24 {
25 #define ISNOTROOT(k)    (k != 0)
26         while (ISNOTROOT(k)) {
27                 int parent = getIndexOfParent(k);
28                 if (a[k] <= a[parent])
29                         break;
30 
31                 /* swim only if a[k] > a[parent] */
32                 exchange(a, k, parent);
33                 k = parent;
34         }
35 }
36 
37 static void sink(int a[], size_t n, int k)
38 {
39 #define ISLEAFNODE(n, k) (k >= n / 2)
40         while (!ISLEAFNODE(n, k)) {
41                 int left  = getIndexOfLeftChild(k);
42                 int right = getIndexOfRightChild(k);
43 
44                 int child = -1; /* set to be an invalid index on purpose */
45                 if (left < n && right < n)
46                         child = (a[left] > a[right]) ? left : right;
47                 else if (left < n && right >= n)
48                         child = left;
49                 else if (left >= n && right < n)
50                         child = right;
51                 else
52                         child = k;
53 
54                 if (a[k] >= a[child])
55                         break;
56 
57                 /* sink only if max of children of a[k] say a[child] > a[k] */
58                 exchange(a, k, child);
59                 k = child;
60         }
61 }
62 
63 static void constructMaxHeap(int a[], size_t n)
64 {
65         for (int i = 0; i < n; i++)
66                 swim(a, i, i);
67 }
68 
69 void heapsort(int a[], size_t n)
70 {
71         constructMaxHeap(a, n);         // firstly construct a max heap
72                                         //
73         while (n > 0) {                 // note a[0] is always the max element
74                 exchange(a, 0, n - 1);  // swap a[0], a[n-1]   then
75                                         //      a[n-1] is      the max one
76                 n--;                    // decrease the size of a[]
77                 sink(a, n, 0);          // re-build the max heap by using sink()
78         }                               //      while walking a[0 .. n-2]
79 }

说明： L37-L61中的sink()函数，关键是找出比结点a[k]关键字大的孩子结点。如果a[k]有两个孩子，那么就取出两个孩子结点中关键字最大的那一个孩子结点与a[k]进行比较。设找出的孩子结点为a[child], 如果a[k] < a[child], 就交换a[k]和a[child]。新的k就是child, 如此循环往复，直到a[k]不需要再往下沉为止。(如果a[k]为叶子结点，显然不需要再往下沉。)

如果使用函数constructMaxHeap()完成了堆的构造后的数组为: int a[] = {3, 2, 1, 0}, 则接下来堆排序的过程可用图推演如下：

《常见排序算法导读(8)[堆排序]》

4. 完整的实现

o heapsort.c

  1 #include <stdio.h>
  2 #include <stdlib.h>
  3 
  4 /*
  5  * Assume the size of an array is 9, e.g.
  6  *
  7  * Array: a[0] a[1] a[2] a[3] a[4] a[5] a[6] a[7] a[8] // n = 9
  8  *
  9  * Heap :        a[0]               --- Level 0 ---
 10  *                / \
 11  *               /   \
 12  *              /     \
 13  *            a[1]    a[2]          --- Level 1 ---
 14  *            / \      / \
 15  *           /   \    /   \
 16  *         a[3] a[4] a[5] a[6]      --- Level 2 ---
 17  *         /  \
 18  *       a[7] a[8]                  --- Level 3 ---
 19  *
 20  * For node a[i],  i = 0, 1, 2, ..., N-1
 21  *
 22  *       Index of its Left  Child = (i + 1) * 2 - 1 = i * 2 + 1
 23  *       Index of its Right Child = (i + 1) * 2     = i * 2 + 2
 24  *       Index of its Parent      = (i + 1) / 2 - 1 = (i - 1) / 2
 25  *
 26  * C code:
 27  *
 28  *       int getIndexOfLeftChild(i)  { return i * 2 + 1;         }
 29  *       int getIndexOfRightChild(i) { return i * 2 + 2;         }
 30  *       int getIndexParent(i)       { return (i - 1) / 2;       }
 31  *       int hasLeftChild(n, i)      { return ((i * 2 + 1) < n); }
 32  *       int hasRightChild(n, i)     { return ((i * 2 + 2) < n); }
 33  *       int hasParent(i)            { return (i > 0);           }
 34  *       int isLeafNode(n, i)        { return (i >= n / 2);      }
 35  */
 36 
 37 static void show(int a[], size_t n)
 38 {
 39         for (int i = 0; i < n; i++)
 40                 printf("%-2c ", a[i]);
 41         printf("\n");
 42 }
 43 
 44 static void exchange(int a[], int i, int j)
 45 {
 46         int t = a[i];
 47         a[i] = a[j];
 48         a[j] = t;
 49 }
 50 
 51 static int getIndexOfParent(int i)
 52 {
 53         return (i - 1) / 2;
 54 }
 55 
 56 static int getIndexOfLeftChild(int i)
 57 {
 58         return i * 2 + 1;
 59 }
 60 
 61 static int getIndexOfRightChild(int i)
 62 {
 63         return i * 2 + 2;
 64 }
 65 
 66 static void swim(int a[], size_t n, int k)
 67 {
 68 #define ISNOTROOT(k)    (k != 0)
 69         while (ISNOTROOT(k)) {
 70                 int parent = getIndexOfParent(k);
 71                 if (a[k] <= a[parent])
 72                         break;
 73 
 74                 /* swim only if a[k] > a[parent] */
 75                 exchange(a, k, parent);
 76                 k = parent;
 77         }
 78 }
 79 
 80 static void sink(int a[], size_t n, int k)
 81 {
 82 #define ISLEAFNODE(n, k) (k >= n / 2)
 83         while (!ISLEAFNODE(n, k)) {
 84                 int left  = getIndexOfLeftChild(k);
 85                 int right = getIndexOfRightChild(k);
 86 
 87                 int child = -1; /* set to invalid index on purpose */
 88                 if (left < n && right < n)
 89                         child = (a[left] > a[right]) ? left : right;
 90                 else if (left < n && right >= n)
 91                         child = left;
 92                 else if (left >= n && right < n)
 93                         child = right;
 94                 else
 95                         child = k;
 96 
 97                 if (a[k] >= a[child])
 98                         break;
 99 
100                 /* sink only if max of children of a[k] say a[child] > a[k] */
101                 exchange(a, k, child);
102                 k = child;
103         }
104 }
105 
106 static void constructMaxHeap(int a[], size_t n)
107 {
108         for (int i = 0; i < n; i++)
109                 swim(a, i, i);
110 }
111 
112 #ifdef _USE_SWIM_ONLY_
113 void heapsort(int a[], size_t n)
114 {
115         constructMaxHeap(a, n);
116 
117         while (n > 0) {
118                 exchange(a, 0, n - 1);
119 
120                 n--;
121                 constructMaxHeap(a, n);
122         }
123 }
124 #else
125 void heapsort(int a[], size_t n)
126 {
127         constructMaxHeap(a, n);
128 
129         while (n > 0) {
130                 exchange(a, 0, n - 1);
131 
132                 n--;
133                 sink(a, n, 0);
134         }
135 }
136 #endif
137 
138 int main(int argc, char *argv[])
139 {
140         if (argc < 2) {
141                 fprintf(stderr, "Usage: %s <C1> [C2] ...\n", argv[0]);
142                 return -1;
143         }
144 
145         argc--;
146         argv++;
147 
148         int n = argc;
149         int *a = (int *)malloc(sizeof(int) * n);
150 #define VALIDATE(p) do { if (p == NULL) return -1; } while (0)
151         VALIDATE(a);
152 
153         for (int i = 0; i < n; i++)
154                 *(a+i) = argv[i][0];
155 
156         printf("                ");
157         for (int i = 0; i < n; i++)
158                 printf("%-2d ", i);
159         printf("\n");
160 
161         printf("Before sorting: "); show(a, n);
162         heapsort(a, n);
163         printf("After  sorting: "); show(a, n);
164 
165         free(a); a = NULL;
166         return 0;
167 }

o 编译并测试

$ gcc -g -Wall -m32 -std=c99 -o heapsort heapsort.c

$ ./heapsort    0  1  2  3  4  5  6  7  8  9
                0  1  2  3  4  5  6  7  8  9
Before sorting: 0  1  2  3  4  5  6  7  8  9
After  sorting: 0  1  2  3  4  5  6  7  8  9

$ ./heapsort    9  8  7  6  5  4  3  2  1  0
                0  1  2  3  4  5  6  7  8  9
Before sorting: 9  8  7  6  5  4  3  2  1  0
After  sorting: 0  1  2  3  4  5  6  7  8  9

$ ./heapsort    S  O  R  T  E  X  A  M  P  L  E
                0  1  2  3  4  5  6  7  8  9  10
Before sorting: S  O  R  T  E  X  A  M  P  L  E
After  sorting: A  E  E  L  M  O  P  R  S  T  X

$ gcc -g -Wall -m32 -std=c99 -D_USE_SWIM_ONLY_ -o heapsort heapsort.c
heapsort.c:80:13: warning: ‘sink’ defined but not used [-Wunused-function]
 static void sink(int a[], size_t n, int k)
             ^

$ ./heapsort    S  O  R  T  E  X  A  M  P  L  E
                0  1  2  3  4  5  6  7  8  9  10
Before sorting: S  O  R  T  E  X  A  M  P  L  E
After  sorting: A  E  E  L  M  O  P  R  S  T  X

最后，有必要提一下堆排序的时间复杂度和空间复杂度。注意：堆排序是一种不稳定的排序算法。

Worst-case performance O(N * logN) Best-case performance O(N) Average performance O(N * logN) Worst-case space complexity O(1) auxiliary

参考资料

小结：

到此为止，我们已经完全弄明白了神秘的堆排序(Heap Sort)。归结起来，只要充分掌握了调整堆的两种方法(向上游(Swim)和往下沉(Sink))，堆排序其实很简单。行文至此，我忽然觉得，做人和治学当如堆排序，因为”既要沉得下去，又要浮得上来“。下一节将介绍归并排序(Merge Sort)。

    原文作者：veli
    原文地址: https://www.cnblogs.com/idorax/p/6606990.html
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。