七、排序(4)——qsort()
一、回顧
| 冒泡排序 | O(n2) | √ | √ |
| 插入排序 | O(n2) | √ | √ |
| 選擇排序 | O(n2) | × | √ |
| 快速排序 | O(nlogn) | × | √ |
| 歸并排序 | O(nlogn) | √ | × |
| 計(jì)數(shù)排序 | O(n+k),k是數(shù)據(jù)范圍 | √ | × |
| 桶排序 | O(n) | √ | × |
| 基數(shù)排序 | O(dn) d是維度 | √ | × |
- 線性排序算法的時(shí)間復(fù)雜度比較低,適用場景比較特殊。
- 小規(guī)模數(shù)據(jù)進(jìn)行排序 ==》時(shí)間復(fù)雜度是 O(n2) 的算法;
- 大規(guī)模數(shù)據(jù)進(jìn)行排序 ==》時(shí)間復(fù)雜度是O(nlogn) 的算法 ==》一般也是首選
二、快速排序的優(yōu)化
關(guān)鍵點(diǎn):合理選擇分區(qū)點(diǎn),來避免時(shí)間復(fù)雜度退化為O(n2)
理想的分區(qū)點(diǎn):被區(qū)分點(diǎn)分開的兩個(gè)分區(qū)中,數(shù)據(jù)的數(shù)量差不多。
==》常用、簡單的分區(qū)算法
1、三數(shù)取中法
從區(qū)間的首、尾、中間,分別取出一個(gè)數(shù),然后進(jìn)行比較,取三個(gè)數(shù)的中位數(shù)作為分區(qū)點(diǎn)。
==》擴(kuò)展:數(shù)據(jù)大時(shí),“五數(shù)取中”、“十?dāng)?shù)取中”
2、隨機(jī)法
每次從要排序的區(qū)間中,隨機(jī)選擇一個(gè)元素作為分區(qū)點(diǎn)。
特點(diǎn):雖不能保證每次分區(qū)點(diǎn)都選的比較好,但是從概率角度分析,不太可能出現(xiàn)每次分區(qū)點(diǎn)都很差的情況。
3、警惕堆棧溢出
快速排序使用堆棧實(shí)現(xiàn)的。==》 遞歸要警惕堆棧溢出
解決方法:
(1)限制遞歸深度。一旦遞歸過深(超過事先設(shè)定的閾值),就停止遞歸。
(2)通過在堆上模擬實(shí)現(xiàn)一個(gè)函數(shù)調(diào)用棧,手動模擬遞歸壓棧、出棧的過程,使得系統(tǒng)棧大小沒有限制。
三、qsort()函數(shù)的分析
1、源碼
/* Copyright (C) 1991,1992,1996,1997,1999,2004 Free Software Foundation, Inc.This file is part of the GNU C Library.Written by Douglas C. Schmidt (schmidt@ics.uci.edu).The GNU C Library is free software; you can redistribute it and/ormodify it under the terms of the GNU Lesser General PublicLicense as published by the Free Software Foundation; eitherversion 2.1 of the License, or (at your option) any later version.The GNU C Library is distributed in the hope that it will be useful,but WITHOUT ANY WARRANTY; without even the implied warranty ofMERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNULesser General Public License for more details.You should have received a copy of the GNU Lesser General PublicLicense along with the GNU C Library; if not, write to the FreeSoftware Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA02111-1307 USA. *//* If you consider tuning this algorithm, you should consult first:Engineering a sort function; Jon Bentley and M. Douglas McIlroy;Software - Practice and Experience; Vol. 23 (11), 1249-1265, 1993. */#include <alloca.h> #include <limits.h> #include <stdlib.h> #include <string.h>/* Byte-wise swap two items of size SIZE. */ /* 以字節(jié)為單位交換兩個(gè)SIZE長度數(shù)組的SWAP宏 */ #define SWAP(a, b,size) do { register size_t __size = (size);register char *__a = (a), *__b = (b); do{har __tmp = *__a;*__a++ = *__b;*__b++ = __tmp;} while (--__size > 0); } while (0)/* Discontinue quicksort algorithm when partition gets below this size.This particular magic number was chosen to work best on a Sun 4/260. */ /* 當(dāng)一個(gè)數(shù)據(jù)段的長度小于這個(gè)值的時(shí)候,將不用快排對其進(jìn)行排序。因?yàn)檫@個(gè)特別的的魔術(shù)數(shù)在Sun 4/260下的性能最好*/ #define MAX_THRESH 4 /* Stack node declarations used to store unfulfilled partition obligations. */ /* 用來存儲還沒處理的數(shù)據(jù)段索引*/ typedef struct {char *lo;char *hi; } stack_node;/* The next 4 #defines implement a very fast in-line stack abstraction. */ /* 下面四個(gè)宏實(shí)現(xiàn)了一個(gè)非常快速的棧(的確很快,但是現(xiàn)實(shí)中真要這樣做,請確保1. 這是核心代碼。2. 這些代碼百分比正確。要不然出錯(cuò)了,調(diào)試調(diào)死你,維護(hù)的人罵死你。)*/ /* The stack needs log (total_elements) entries (we could even subtract log(MAX_THRESH)).Since total_elements has type size_t, we get as upper bound for log (total_elements):bits per byte (CHAR_BIT) * sizeof(size_t). */ /* 棧需要log(total_elements)個(gè)元素(當(dāng)然我們也可以在這個(gè)基礎(chǔ)上減去log(MAX_THRESH)。)(PS:log(x/y) = log(x) - log(y))因?yàn)閭魅霐?shù)據(jù)個(gè)數(shù)total_elements的類型是size_t,所以我們可以把棧元素上限設(shè)為:size_t的位數(shù)(CHAR_BIT) * sizeof(size_t)。PS:棧用來記錄的是還沒處理的數(shù)據(jù)段索引,最壞的情況是分開的兩個(gè)數(shù)據(jù)段其中一個(gè)已經(jīng)不用再分,這個(gè)時(shí)候棧不許要任何記錄。最好的情況是進(jìn)行l(wèi)og(total_elements)次劃分,此時(shí)棧需要記錄log(total_elements)個(gè)索引,但是這個(gè)算法在一個(gè)分片的元素個(gè)數(shù)小于MAX_THRESH便不再劃分,所以實(shí)際只需log(total_elements / MAX_THRESH)個(gè)空間。CHAR_BIT在limits.h有定義,意思是一個(gè)字節(jié)有多少位,因?yàn)閟izeof是算出一種類型占幾個(gè)字節(jié),所以(CHAR_BIT) * sizeof(size_t)是當(dāng)total_elements去最大值的值,也就是這里棧元素個(gè)數(shù)的上限。*/ #define STACK_SIZE (CHAR_BIT * sizeof(size_t)) #define PUSH(low, high) ((void) ((top->lo = (low)), (top->hi = (high)), ++top)) #define POP(low, high) ((void) (--top, (low = top->lo), (high = top->hi))) #define STACK_NOT_EMPTY (stack < top)/* Order size using quicksort. This implementation incorporatesfour optimizations discussed in Sedgewick:這個(gè)快排的程序?qū)崿F(xiàn)了Sedgewick書中討論的四個(gè)優(yōu)化,下面從大到小說明:(大概這意思...)1. Non-recursive, using an explicit stack of pointer that store thenext array partition to sort. To save time, this maximum amountof space required to store an array of SIZE_MAX is allocated on thestack. Assuming a 32-bit (64 bit) integer for size_t, this needsonly 32 * sizeof(stack_node) == 256 bytes (for 64 bit: 1024 bytes).Pretty cheap, actually.1. 不用遞歸,用了顯示的指針棧存儲下一段要排序的數(shù)據(jù)。為了節(jié)省時(shí)間,為棧申請了最大的存儲空間。假設(shè)size_t是一個(gè)32位(64位)的整數(shù),這里僅需要 32 * sizeof(stack_node) = 256 bytes(對于64位:1024bytes)。事實(shí)上很小。(一個(gè)棧節(jié)點(diǎn)兩指針,32位就是2 * 4 個(gè)字節(jié),64位是8 * 2位字節(jié))2. Chose the pivot element using a median-of-three decision tree. This reduces the probability of selecting a bad pivot value andeliminates certain extraneous comparisons.2. 用中值決策樹選擇關(guān)鍵值。這減小了選擇一個(gè)差關(guān)鍵值的可能性和排除特定的無關(guān)的比較。3. Only quicksorts TOTAL_ELEMS / MAX_THRESH partitions, leavinginsertion sort to order the MAX_THRESH items within each partition.This is a big win, since insertion sort is faster for small, mostlysorted array segments.3. 只用快排對TOTAL_ELEMS / MAX_THRESH個(gè)數(shù)據(jù)段進(jìn)行了排序,用插入排序?qū)γ總€(gè)數(shù)據(jù)段的MAX_THRESH個(gè)數(shù)據(jù)進(jìn)行排序。這是一個(gè)很好的改進(jìn),因?yàn)椴迦肱判蛟谔幚硇〉摹⒒居行虻臄?shù)據(jù)段時(shí)比快排更快。 4. The larger of the two sub-partitions is always pushed onto thestack first, with the algorithm then concentrating on thesmaller partition. This *guarantees* no more than log (total_elems)stack size is needed (actually O(1) in this case)!4. 大的數(shù)據(jù)分段通常先壓入棧內(nèi),算法優(yōu)先處理小的數(shù)據(jù)分段。這就保證棧的元素不會超過log(total_elems)(事實(shí)上這里只用了常數(shù)個(gè)空間)。 */ void _quicksort (void *const pbase, size_t total_elems, size_t size, __compar_d_fn_t cmp, void *arg) {/* 寄存器指針,最快的指針,當(dāng)然系統(tǒng)不一定會把它放到寄存器。register只是一種建議。*/register char *base_ptr = (char *) pbase;const size_t max_thresh = MAX_THRESH * size;if (total_elems == 0)/* Avoid lossage with unsigned arithmetic below. */return;if (total_elems > MAX_THRESH){char *lo = base_ptr;char *hi = &lo[size * (total_elems - 1)];/* 因?yàn)橛昧松厦鏃5暮?#xff0c;所以下面兩個(gè)變量的名字一定不能改...*/stack_node stack[STACK_SIZE];stack_node *top = stack;PUSH (NULL, NULL);while (STACK_NOT_EMPTY){char *left_ptr;char *right_ptr;/* Select median value from among LO, MID, and HI. RearrangeLO and HI so the three values are sorted. This lowers theprobability of picking a pathological pivot value and skips a comparison for both the LEFT_PTR and RIGHT_PTR in the while loops. *//* 在數(shù)組的第一位、中間一位、最后一位中選出一個(gè)中值。同時(shí)也會對第一位和最后一位進(jìn)行排序以達(dá)到這三個(gè)值都是有序的目的。這降低了選擇一個(gè)很爛的關(guān)鍵值的可能性,同時(shí)也跳過了左指針和右指針在while循環(huán)里面的一次比較。*/char *mid = lo + size * ((hi - lo) / size >> 1);if ((*cmp) ((void *) mid, (void *) lo, arg) < 0)SWAP (mid, lo, size);if ((*cmp) ((void *) hi, (void *) mid, arg) < 0)SWAP (mid, hi, size);elsegoto jump_over;if ((*cmp) ((void *) mid, (void *) lo, arg) < 0)SWAP (mid, lo, size); jump_over:;left_ptr = lo + size;right_ptr = hi - size;/* Here's the famous ``collapse the walls'' section of quicksort.Gotta like those tight inner loops! They are the main reasonthat this algorithm runs much faster than others. *//* 這里就是快排中著名的“collapse the walls(推墻??)”。和那些緊湊的內(nèi)層循環(huán)非常像!它們是這個(gè)算法比其他算法快的主要原因。PS:了解過快排的應(yīng)該對下面這一段都比較熟悉,就是把比關(guān)鍵值小的移到數(shù)組左邊,比關(guān)鍵值大的移到數(shù)組右邊,把數(shù)據(jù)分成大小兩段的過程*/do{while ((*cmp) ((void *) left_ptr, (void *) mid, arg) < 0)left_ptr += size;while ((*cmp) ((void *) mid, (void *) right_ptr, arg) < 0)right_ptr -= size;if (left_ptr < right_ptr){SWAP (left_ptr, right_ptr, size);if (mid == left_ptr)mid = right_ptr;else if (mid == right_ptr)mid = left_ptr;left_ptr += size;right_ptr -= size;}else if (left_ptr == right_ptr){left_ptr += size;right_ptr -= size;break;}}while (left_ptr <= right_ptr);/* Set up pointers for next iteration. First determine whetherleft and right partitions are below the threshold size. If so,ignore one or both. Otherwise, push the larger partition'sbounds on the stack and continue sorting the smaller one. *//* 給下次迭代的指針賦值。首先判斷左右兩段數(shù)據(jù)的元素個(gè)數(shù)是否小于閾值,如果是,跳過這一個(gè)或兩個(gè)分段。如若不是,把大的數(shù)據(jù)段的開始和結(jié)束指針入棧,繼續(xù)劃分小的數(shù)據(jù)段。*/if ((size_t) (right_ptr - lo) <= max_thresh){/* 左右兩個(gè)數(shù)據(jù)段的元素都小于閾值,取出棧中數(shù)據(jù)段進(jìn)行劃分*/if ((size_t) (hi - left_ptr) <= max_thresh)/* Ignore both small partitions. */POP (lo, hi);else/* Ignore small left partition. (只有左邊大于閾值)*/lo = left_ptr;}else if ((size_t) (hi - left_ptr) <= max_thresh)/* Ignore small right partition. (只有右邊大于閾值)*/hi = right_ptr;else if ((right_ptr - lo) > (hi - left_ptr)){/* Push larger left partition indices. *//* 兩個(gè)數(shù)據(jù)段的元素個(gè)數(shù)都大于閾值,大的入棧,小的繼續(xù)劃分。 */PUSH (lo, right_ptr);lo = left_ptr;}else{/* Push larger right partition indices. *//* 兩個(gè)數(shù)據(jù)段的元素個(gè)數(shù)都大于閾值,大的入棧,小的繼續(xù)劃分。 */PUSH (left_ptr, hi);hi = right_ptr;}}}/* Once the BASE_PTR array is partially sorted by quicksort the restis completely sorted using insertion sort, since this is efficientfor partitions below MAX_THRESH size. BASE_PTR points to the beginningof the array to sort, and END_PTR points at the very last element inthe array (*not* one beyond it!). *//* 當(dāng)數(shù)組經(jīng)過快排的排序后,已經(jīng)是整體有序了。剩下的排序由插入排序完成,因?yàn)閿?shù)據(jù)段小于MAX_THRESH時(shí),插入排序效率更高。此時(shí)排序的首指針是數(shù)組的首指針,尾指針是數(shù)組的尾指針(不是倒數(shù)第二個(gè))*/#define min(x, y) ((x) < (y) ? (x) : (y)){char *const end_ptr = &base_ptr[size * (total_elems - 1)];char *tmp_ptr = base_ptr;char *thresh = min(end_ptr, base_ptr + max_thresh);register char *run_ptr;/* Find smallest element in first threshold and place it at thearray's beginning. This is the smallest array element,and the operation speeds up insertion sort's inner loop. *//* 找出第一段的最小一個(gè)值并把它放在數(shù)組的第一個(gè)位置。這是數(shù)組的最小元素(用快排排過,應(yīng)該比較容易理解),這一步可以加入插入排序的內(nèi)層循環(huán)*/for (run_ptr = tmp_ptr + size; run_ptr <= thresh; run_ptr += size)if ((*cmp) ((void *) run_ptr, (void *) tmp_ptr, arg) < 0)tmp_ptr = run_ptr;if (tmp_ptr != base_ptr)SWAP (tmp_ptr, base_ptr, size);/* Insertion sort, running from left-hand-side up to right-hand-side. *//* 從左到右執(zhí)行插入排序。*/run_ptr = base_ptr + size;while ((run_ptr += size) <= end_ptr){/*上面說的加速內(nèi)層循環(huán),就是不用在這里判斷*/tmp_ptr = run_ptr - size;/*一直往回找直到找到大于或等于當(dāng)前元素的元素*/while ((*cmp) ((void *) run_ptr, (void *) tmp_ptr, arg) < 0)tmp_ptr -= size;/*把當(dāng)前元素移到找出元素的后面去*/tmp_ptr += size;if (tmp_ptr != run_ptr){char *trav;trav = run_ptr + size;while (--trav >= run_ptr){char c = *trav;char *hi, *lo;/*這個(gè)內(nèi)層循環(huán)只是把每個(gè)元素的最后一位移到后面一個(gè)元素去*/for (hi = lo = trav; (lo -= size) >= tmp_ptr; hi = lo)* hi = *lo;*hi = c;}}}} }2、分析
通過多種排序算法實(shí)現(xiàn)的:
- 當(dāng)數(shù)據(jù)量較小是,qsort() 會優(yōu)先使用歸并算法來排序輸入數(shù)據(jù)(原因:空間復(fù)雜度為O(n),空間換時(shí)間)
- 當(dāng)數(shù)據(jù)量太大,qsort() 使用快速排序,分區(qū)點(diǎn)選擇方法:“三數(shù)取中法”
- 遞歸太深會導(dǎo)致堆棧溢出問題的解決方法:qsort()實(shí)現(xiàn)了一個(gè)堆上的棧,手動模擬遞歸來解決該問題。
- 當(dāng)排序的區(qū)間中的元素小于等于4時(shí),qsort()就退化為插入排序,不再使用遞歸來做快速排序。原因:在小規(guī)模數(shù)據(jù)面前,O(n2時(shí)間復(fù)雜度并不一定比O(nlogn)的算法執(zhí)行時(shí)間長)
- 時(shí)間復(fù)雜度代表增長趨勢,但是在表示的過程中,我們會省略低階、系數(shù)和常數(shù)。例:O(knlogn + c)時(shí)間復(fù)雜度中的 k 和 c 可能還是一個(gè)比較大的數(shù)。
總結(jié)
以上是生活随笔為你收集整理的七、排序(4)——qsort()的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 七、排序(3)——线性排序
- 下一篇: 八、二分查找(Binary Search