當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

数据结构--堆 Heap

發(fā)布時間：2024/7/5 编程问答 27 豆豆

生活随笔收集整理的這篇文章主要介紹了数据结构--堆 Heap 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

文章目錄

- 1. 概念
- 2. 操作和存儲
- - 2.1 插入一個元素
  - 2.2 刪除堆頂元素
- 3. 堆排序（不穩(wěn)定排序）
- - 3.1 建堆
  - 3.2 排序
  - 3.3 思考：為什么快速排序要比堆排序性能好？兩者都是O(nlogn)
- 4. 堆應(yīng)用
- - 4.1 優(yōu)先級隊(duì)列
  - 4.2 用堆求 Top K（前K大數(shù)據(jù)）
  - 4.3 求中位數(shù)
  - 4.4 思考：海量關(guān)鍵詞搜索記錄，求topK

1. 概念

堆是一種特殊的樹
a. 堆是完全二叉樹（除最后一層，其他層都是滿的，最后一層節(jié)點(diǎn)都靠左）
b. 每一個節(jié)點(diǎn)都大于等于（或者都小于等于）其子樹中每個節(jié)點(diǎn)的值

2. 操作和存儲

完全二叉樹適合用數(shù)組存儲，節(jié)省空間（不需要左右指針）

2.1 插入一個元素

2.2 刪除堆頂元素

/*** @description: 堆* @author: michael ming* @date: 2019/5/26 22:22* @modified by: */ #include <iostream> #include <limits.h> using namespace std; class MinHeap {int *arr;int capacity;int heap_size; public:MinHeap(int cap){heap_size = 0;capacity = cap;arr = new int [capacity];}~MinHeap(){delete [] arr;}int heapsize(){return heap_size;}bool full(){return capacity == heap_size;}void MinHeapify(int i){Heapify(i,heap_size);}void Heapify(int i, int size){int l = left(i), r = right(i);int min = i;if(l < size && arr[l] < arr[i])min = l;if(r < size && arr[r] < arr[min])min = r;if(min != i){swap(arr[i], arr[min]);Heapify(min,size);}}int parent(int i){return (i-1)/2;}int left(int i){return 2*i+1;}int right(int i){return 2*i+2;}void delMin(){if(heap_size <= 0)return;arr[0] = arr[heap_size-1];heap_size--;MinHeapify(0);}int getMin(){return arr[0];}void insert(int val){if(heap_size == capacity){cout << "overflow!" << endl;return;}heap_size++;int i = heap_size-1;arr[i] = val;while(i > 0 && arr[parent(i)] > arr[i]){swap(arr[parent(i)], arr[i]);i = parent(i);}}void sort(){int size = heap_size;for(int i = heap_size-1; i >= 0; --i){swap(arr[i], arr[0]);Heapify(0,--size);}}void print(){for(int i = 0; i < heap_size; ++i)cout << arr[i] << " ";} }; int main() {MinHeap minheap(8);minheap.insert(6);minheap.insert(8);minheap.insert(12);minheap.insert(4);minheap.insert(15);minheap.insert(0);minheap.insert(5);minheap.insert(9);minheap.insert(10);minheap.delMin();cout << minheap.getMin() << endl;return 0; }

3. 堆排序（不穩(wěn)定排序）

3.1 建堆

方法1：一個一個的插入這種方式
方法2：從倒數(shù)第一個有葉子節(jié)點(diǎn)的節(jié)點(diǎn)開始，檢查其子節(jié)點(diǎn)是否滿足堆，依次往前，直到堆頂，建堆的復(fù)雜度O(n)

3.2 排序

建堆結(jié)束后，最大元素在堆頂，與最后一個元素交換（不穩(wěn)定），然后對剩余的 n-1 個元素重新構(gòu)建堆，重復(fù)這個過程，最后只剩1個元素，排序結(jié)束，建堆復(fù)雜度O(nlogn)

堆排序代碼見：https://blog.csdn.net/qq_21201267/article/details/80993672#t7

3.3 思考：為什么快速排序要比堆排序性能好？兩者都是O(nlogn)

快排數(shù)據(jù)訪問方式比較友好。快排訪問數(shù)據(jù)是順序訪問，堆排序是跳著訪問，這樣對CPU緩存是不友好的

同樣的數(shù)據(jù)，堆排序中數(shù)據(jù)交換次數(shù)多余快排。快排的交換次數(shù)不會比逆序度多；堆排序第一步建堆，打亂了數(shù)據(jù)原有順序，數(shù)據(jù)有序度降低，交換次數(shù)變多

4. 堆應(yīng)用

4.1 優(yōu)先級隊(duì)列

優(yōu)先級隊(duì)列用堆實(shí)現(xiàn)是最直接、最高效的。優(yōu)先出隊(duì)，就是堆頂取出
a. 合并有序小文件
把多個有序的小文件的第一個元素取出，放入堆中，取出堆頂?shù)酱笪募?#xff0c;然后再從小文件中取出一個加入到堆，這樣就把小文件的元素合并到大文件中了。

/*** @description: 合并有序小文件* @author: michael ming* @date: 2019/5/29 22:10* @modified by: */ #include "heap.cpp" int main() {int file0[10] = {11, 21, 31, 41, 51, 61, 71, 81, 91, 101};int file1[8] = {1, 2, 3, 4, 5, 6, 7, 80};int file2[9] = {13, 23, 33, 43, 53, 63, 73, 83, 93};int file3[10] = {12, 22, 32, 42, 52, 62, 72, 82, 92, 102};int file4[7] = {15, 25, 35, 45, 55, 65, 72};int len0 = 10, len1 = 8, len2 = 9, len3 = 10, len4 = 7;int i0, i1, i2, i3, i4, j;i0 = i1 = i2 = i3 = i4 = j = 0;const int new_len = len0+len1+len2+len3+len4;int bigFile[new_len];MinHeap intheap(5);intheap.insert(file0[i0]);intheap.insert(file1[i1]);intheap.insert(file2[i2]);intheap.insert(file3[i3]);intheap.insert(file4[i4]);int top;while(intheap.heapsize()){top = intheap.getMin();bigFile[j++] = top;intheap.delMin();if(i0 < len0-1 && top == file0[i0]) //可以用list做，就不用查找最小的是哪個文件的{intheap.insert(file0[++i0]);}else if(i1 < len1-1 && top == file1[i1]){intheap.insert(file1[++i1]);}else if(i2 < len2-1 && top == file2[i2]){intheap.insert(file2[++i2]);}else if(i3 < len3-1 && top == file3[i3]){intheap.insert(file3[++i3]);}else if(i4 < len4-1 && top == file4[i4]){intheap.insert(file4[++i4]);}}for(i0 = 1, j = 0; j < new_len; i0++,++j){cout << bigFile[j] << " ";if(i0%10 == 0)cout << endl;}return 0; }

b. 高性能定時器
多個定時器，需要每隔很短的時間（比如1秒）掃描一遍，查詢哪個任務(wù)時間到了，需要開始執(zhí)行，這樣有時候很多掃描是徒勞的，如果任務(wù)列表很長，掃描很耗時。采用小頂堆，最先執(zhí)行的放在堆頂，就只需要在堆頂時間到時執(zhí)行堆頂任務(wù)即可，避免無謂的掃描。

4.2 用堆求 Top K（前K大數(shù)據(jù)）

a. 針對靜態(tài)數(shù)據(jù)（數(shù)據(jù)不變）
建立大小為K的小頂堆，遍歷數(shù)組，數(shù)組元素與堆頂比較，比堆頂大，就把堆頂刪除，并插入該元素到堆
b. 針對動態(tài)數(shù)據(jù)（數(shù)據(jù)不斷插入更新的）
在動態(tài)數(shù)據(jù)插入的時候就與堆頂比較，看是否入堆，始終維護(hù)這個堆，需要的時候直接返回，最壞O(n*lgK)

/*** @description: 用堆求最大的k個元素* @author: michael ming* @date: 2019/5/30 0:06* @modified by: */ #include "heap.cpp" int main() {int i = 0;const int len = 10;int data[len] = {2, 8, 1, 7, 12, 24, 6, 10, 90, 100};MinHeap intheap(5);//求前5大元素while(!intheap.full()){if(i < len)intheap.insert(data[i++]);}while(i < len){if(data[i] > intheap.getMin()){intheap.delMin();intheap.insert(data[i]);}i++;}intheap.sort();intheap.print();return 0; }

4.3 求中位數(shù)

靜態(tài)數(shù)據(jù)：先排序，中間位置的數(shù)就是中位數(shù)
動態(tài)數(shù)據(jù)：動態(tài)變化，中位數(shù)位置總在變動，每次都排序的話，效率很低，借助堆可以高效的解決。

插入數(shù)據(jù)到某個堆里后，兩個堆數(shù)據(jù)量（應(yīng)相等或者大堆比小堆多1）若不滿足括號要求，則將某個堆的堆頂元素移動到另一個堆，直到滿足括號要求，堆化復(fù)雜度O(lgn)，大堆的堆頂就是中位數(shù)，求中位數(shù)復(fù)雜度O(1)。

同理，可以求99%響應(yīng)時間（就是大于前面99%數(shù)據(jù)的那個數(shù)據(jù)）
跟上面過程類似，只是大堆中保存 n x 0.99 個數(shù)據(jù)，小堆存 n x 0.01 個數(shù)據(jù)

/*** @description: 利用大小兩個堆求中位數(shù)* @author: michael ming* @date: 2019/5/30 20:37* @modified by: */ #include <algorithm> #include <iostream> #include <vector> using namespace std; void printVec(vector<int> v) {for (int i = 0; i < v.size(); ++i)cout << v[i] << " ";cout << endl; } int main() {const int len = 7;int staticArr[len] = {7,-1,9,0,8,77,24};//-1,0,7,*8*,9,24,77vector<int> maxheap, minheap;for(int i = 0; i < len; ++i){if(maxheap.empty()){maxheap.push_back(staticArr[i]);continue;}//----選擇插入哪個堆-----if(!maxheap.empty() && staticArr[i] <= maxheap[0]){maxheap.push_back(staticArr[i]);push_heap(maxheap.begin(),maxheap.end());//默認(rèn)采用 < , 大堆}else if(!maxheap.empty() && staticArr[i] > maxheap[0]){minheap.push_back(staticArr[i]);push_heap(minheap.begin(),minheap.end(),greater<int>());//小堆，采用 >}//----平衡兩個堆的節(jié)點(diǎn)比列----if(maxheap.size() > minheap.size() && maxheap.size() - minheap.size() > 1){minheap.push_back(maxheap[0]);//大堆頂進(jìn)入小堆push_heap(minheap.begin(),minheap.end(),greater<int>());pop_heap(maxheap.begin(),maxheap.end());//堆頂?shù)侥┪擦?/span>maxheap.pop_back();//刪除到末尾的"堆頂"}else if(maxheap.size() < minheap.size()){maxheap.push_back(minheap[0]);push_heap(maxheap.begin(),maxheap.end());//默認(rèn)采用 < , 大堆pop_heap(minheap.begin(),minheap.end(),greater<int>());minheap.pop_back();}}if(maxheap.size())cout << "中位數(shù)為：" << maxheap[0] << endl;return 0; }

stl heap操作：http://www.cplusplus.com/reference/algorithm/pop_heap/

對上面程序稍加改造，對動態(tài)數(shù)據(jù)進(jìn)行求解中位數(shù)

/*** @description: 中位數(shù)求解，利用大小堆，動態(tài)數(shù)據(jù)插入* @author: michael ming* @date: 2019/5/30 23:13* @modified by: */ #include <algorithm> #include <iostream> #include <vector> using namespace std; void printVec(vector<int> v) {for (int i = 0; i < v.size(); ++i)cout << v[i] << " ";cout << endl; } int main() {int num;vector<int> maxheap, minheap, allnum;while(cin >> num){allnum.push_back(num);if(maxheap.empty()){maxheap.push_back(num);cout << "排序后的數(shù)組：" << endl;printVec(allnum);cout << "中位數(shù)為：" << maxheap[0] << endl;continue;}//----選擇插入哪個堆-----if(!maxheap.empty() && num <= maxheap[0]){maxheap.push_back(num);push_heap(maxheap.begin(),maxheap.end());//默認(rèn)采用 < , 大堆}else if(!maxheap.empty() && num > maxheap[0]){minheap.push_back(num);push_heap(minheap.begin(),minheap.end(),greater<int>());//小堆，采用 >}//----平衡兩個堆的節(jié)點(diǎn)比列----if(maxheap.size() > minheap.size() && maxheap.size() - minheap.size() > 1){minheap.push_back(maxheap[0]);//大堆頂進(jìn)入小堆push_heap(minheap.begin(),minheap.end(),greater<int>());pop_heap(maxheap.begin(),maxheap.end());//堆頂?shù)侥┪擦?/span>maxheap.pop_back();//刪除到末尾的"堆頂"}else if(maxheap.size() < minheap.size()){maxheap.push_back(minheap[0]);push_heap(maxheap.begin(),maxheap.end());//默認(rèn)采用 < , 大堆pop_heap(minheap.begin(),minheap.end(),greater<int>());minheap.pop_back();}sort(allnum.begin(),allnum.end());cout << "排序后的數(shù)組：" << endl;printVec(allnum);if(maxheap.size())cout << "中位數(shù)為：" << maxheap[0] << endl;}return 0; }

4.4 思考：海量關(guān)鍵詞搜索記錄，求topK

用散列表去重，并累加搜索次數(shù)
再建立大小為K的小頂堆，遍歷散列表，次數(shù)大于堆頂?shù)?#xff0c;頂替堆頂入堆
（以上在散列表很大時，需要內(nèi)存超過單機(jī)內(nèi)存，怎么辦？）
建立n個空文件，對搜索關(guān)鍵詞求哈希值，哈希值對n取模，得到該關(guān)鍵詞被分到的文件號（0到n-1）
對每個文件，利用散列和堆，分別求出topK，然后把n個topK（比如10個Top 20，200很小了吧）放在一起，出現(xiàn)次數(shù)最多的K（20）個關(guān)鍵詞就是這海量數(shù)據(jù)里搜索最頻繁的。

創(chuàng)作挑戰(zhàn)賽新人創(chuàng)作獎勵來咯，堅(jiān)持創(chuàng)作打卡瓜分現(xiàn)金大獎

總結(jié)

以上是生活随笔為你收集整理的数据结构--堆 Heap的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇：动态规划应用--搜索引擎拼写纠错
下一篇：贪心算法（Greedy Algorith