當前位置：首頁 > 编程语言 > c/c++ >内容正文

c/c++

C++中三种正则表达式比较（C regex，C ++regex，boost regex）

發布時間：2025/3/21 c/c++ 30 豆豆

生活随笔收集整理的這篇文章主要介紹了 C++中三种正则表达式比较（C regex，C ++regex，boost regex）小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

原文地址：https://www.cnblogs.com/pmars/archive/2012/10/24/2736831.html

工作需要用到C++中的正則表達式，所以就研究了以上三種正則。

1，C regex

/* write by xingming* time:2012年10月19日15:51:53* for: test regex* */#include <regex.h> #include <iostream> #include <sys/types.h> #include <stdio.h> #include <cstring> #include <sys/time.h>using namespace std; const int times = 1000000;int main(int argc,char** argv) {char pattern[512]="finance\.sina\.cn|stock1\.sina\.cn|3g\.sina\.com\.cn.*(channel=finance|_finance$|ch=stock|/stock/)|dp.sina.cn/.*ch=9&";const size_t nmatch = 10;regmatch_t pm[10];int z ;regex_t reg;char lbuf[256]="set",rbuf[256];char buf[3][256] = {"finance.sina.cn/google.com/baidu.com.google.sina.cndddddddddddddddddddddda.sdfasdfeoasdfnahsfonadsdf","3g.com.sina.cn.google.com.dddddddddddddddddddddddddddddddddddddddddddddddddddddbaidu.com.sina.egooooooooo","http://3g.sina.com.cn/google.baiduchannel=financegogo.sjdfaposif;lasdjf.asdofjas;dfjaiel.sdfaosidfj"};printf("input strings:\n");timeval end,start;gettimeofday(&start,NULL);regcomp(&reg,pattern,REG_EXTENDED|REG_NOSUB);for(int i = 0 ; i < times; ++i){for(int j = 0 ; j < 3; ++j){z = regexec(&reg,buf[j],nmatch,pm,REG_NOTBOL); /* if(z==REG_NOMATCH)printf("no match\n");elseprintf("ok\n");*/}}gettimeofday(&end,NULL);uint time = (end.tv_sec-start.tv_sec)*1000000 + end.tv_usec - start.tv_usec;cout<<time/1000000<<" s and "<<time%1000000<<" us."<<endl;return 0 ; }

使用正則表達式可簡單的分成幾步：

1.編譯正則表達式

2.執行匹配

3.釋放內存

首先，編譯正則表達式

int regcomp(regex_t *preg, const char *regex, int cflags);

reqcomp()函數用于把正則表達式編譯成某種格式，可以使后面的匹配更有效。

preg： regex_t結構體用于存放編譯后的正則表達式；

regex：指向正則表達式指針；

cflags：編譯模式

共有如下四種編譯模式：

REG_EXTENDED：使用功能更強大的擴展正則表達式

REG_ICASE：忽略大小寫

REG_NOSUB：不用存儲匹配后的結果

REG_NEWLINE：識別換行符，這樣‘$’就可以從行尾開始匹配，‘^’就可以從行的開頭開始匹配。否則忽略換行符，把整個文本串當做一個字符串處理。

其次，執行匹配

int regexec(const regex_t *preg, const char *string, size_t nmatch, regmatch_t pmatch[], int eflags);

preg：已編譯的正則表達式指針；

string：目標字符串；

nmatch:pmatch數組的長度；

pmatch：結構體數組，存放匹配文本串的位置信息；

eflags：匹配模式

共兩種匹配模式：

REG_NOTBOL：The match-beginning-of-line operator always fails to match? (but see? the? compilation? flag? REG_NEWLINE above). This flag may be used when different portions of a string are passed? to? regexec and the beginning of the string should not be interpreted as the beginning of the line.

REG_NOTEOL:The match-end-of-line operator always fails to? match? (but? see the compilation flag REG_NEWLINE above)

最后，釋放內存
void regfree(regex_t *preg);
當使用完編譯好的正則表達式后，或者需要重新編譯其他正則表達式時，一定要使用這個函數清空該變量。

其他，處理錯誤
size_t regerror(int errcode, const regex_t *preg, char *errbuf, size_t errbuf_size);
當執行regcomp 或者regexec 產生錯誤的時候，就可以調用這個函數而返回一個包含錯誤信息的字符串。
errcode：由regcomp 和 regexec 函數返回的錯誤代號。
preg：已經用regcomp函數編譯好的正則表達式，這個值可以為NULL。
errbuf：指向用來存放錯誤信息的字符串的內存空間。
errbuf_size：指明buffer的長度，如果這個錯誤信息的長度大于這個值，則regerror 函數會自動截斷超出的字符串，但他仍然會返回完整的字符串的長度。所以我們可以用如下的方法先得到錯誤字符串的長度。

當然我在測試的時候用到的也比較簡單，所以就直接用了，速度一會再說！

2，C++ regex

/* write by xingming* time:2012年10月19日15:51:53* for: test regex* */#include <regex> #include <iostream> #include <stdio.h> #include <string>using namespace std;int main(int argc,char** argv) {regex pattern("[[:digit:]]",regex_constants::extended);printf("input strings:\n");string buf;while(cin>>buf){printf("*******\n%s\n********\n",buf.c_str());if(buf == "quit"){printf("quit just now!\n");break;}match_results<string::const_iterator> result;printf("run compare now! '%s'\n", buf.c_str());bool valid = regex_match(buf,result,pattern);printf("compare over now! '%s'\n", buf.c_str());if(!valid)printf("no match!\n");elseprintf("ok\n");}return 0 ; }

C++這個真心不想多說它，測試過程中發現?字符匹配的時候 ‘a' 是可以匹配的，a+也是可以的，[[:w:]]也可以匹配任意字符，但[[:w:]]+就只能匹配一個字符，+號貌似不起作用了。所以后來就干脆放棄了這偉大的C++正則，如果有大牛知道這里面我錯在哪里了，真心感謝你告訴我一下，謝謝。

3，boost regex

/* write by xingming* for:test boost regex* time:2012年10月23日11:35:33* */#include <iostream> #include <string> #include <sys/time.h> #include "boost/regex.hpp"using namespace std; using namespace boost; const int times = 10000000;int main() {regex pattern("finance\\.sina\\.cn|stock1\\.sina\\.cn|3g\\.sina\\.com\\.cn.*(channel=finance|_finance$|ch=stock|/stock/)|dp\\.s ina\\.cn/.*ch=9&");cout<<"input strings:"<<endl;timeval start,end;gettimeofday(&start,NULL);string input[] = {"finance.sina.cn/google.com/baidu.com.google.sina.cn","3g.com.sina.cn.google.com.baidu.com.sina.egooooooooo","http://3g.sina.com.cn/google.baiduchannel=financegogo"};for(int i = 0 ;i < times; ++ i){for(int j = 0 ; j < 3;++j){//if(input=="quit")// break;//cout<<"string:'"<<input<<'\''<<endl; cmatch what;if(regex_search(input[j].c_str(),what,pattern)) ;// cout<<"OK!"<<endl;else ;// cout<<"error!"<<endl; }}gettimeofday(&end,NULL);uint time = (end.tv_sec-start.tv_sec)*1000000 + end.tv_usec - start.tv_usec;cout<<time/1000000<<" s and "<<time%1000000<<" us."<<endl;return 0 ; }

boost正則不用多說了，要是出去問，C++正則怎么用啊？那90%的人會推薦你用boost正則，他實現起來方便，正則庫也很強大，資料可以找到很多，所以我也不在闡述了。

4，對比情況

單位(us)

boost regex

單位(us)

C regex

平均

218,699

218,700

90,631

90,632

10w

2,186,109

2,194,524

2,188,762

2,186,343

2,192,902

2,191,350

10w

902,658

907,547

915,934

891,250

903,899

900,113

100w

25,606,021

28,633,984

28,956,997

26,912,245

26,909,788

27,669,546

100w

9,030,497

9,016,080

8,939,238

8,953,076

9,041,565

8,983,831

1000w

218,126,580

218,126,581

1000w

89,609,061

89,609,062

正則

正則

字符串

{"finance.sina.cn/google.com/baidu.com.google.sina.cn" ,?

字符串

{"finance.sina.cn/google.com/baidu.com.google.sina.cn" ,?

"3g.com.sina.cn.google.com.baidu.com.sina.egooooooooo" ,?

"http://3g.sina.com.cn/google.baiduchannel=financegogo"};

http://3g.sina.com.cn/google.baiduchannel=financegogo};

總結： C regex的速度讓我吃驚啊，相比boost的速度，C regex的速度幾乎要快上3倍，看來正則引擎的選取上應該有著落了！上面的表格中我用到的正則和字符串是一樣的（在代碼中C regex的被我加長了），速度相差幾乎有3倍，C的速度大約在30+w/s , 而boost的速度基本在15-w/s ,所以對比就出來了！在這里Cregex的速度很讓我吃驚了已經，但隨后我的測試更讓我吃驚。我以前在.net正則方面接觸的比較多，就寫了一個.net版本的作為對比， using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Text.RegularExpressions;namespace 平常測試 {class Program{static int times = 1000000;static void Main(string[] args){Regex reg = new Regex(@"(?>finance\.sina\.cn|stock1\.sina\.cn|3g\.sina\.com\.cn.*(?:channel=finance|_finance$|ch=stock|/stock/)|dp.sina.cn/.*ch=9&)",RegexOptions.Compiled);string[] str = new string[]{@"finance.sina.cn/google.com/baidu.com.google.sina.cn",@"3g.com.sina.cn.google.com.baidu.com.sina.egooooooooo",@"http://3g.sina.com.cn/google.baiduchannel=financegogo"};int tt = 0;DateTime start = DateTime.Now;for (int i = 0; i < times; ++i){for (int j = 0; j < 3; ++j){if (reg.IsMatch(str[j])) ;//Console.WriteLine("OK!");//else//Console.WriteLine("Error!"); }}DateTime end = DateTime.Now;Console.WriteLine((end - start).TotalMilliseconds);Console.WriteLine(tt);Console.ReadKey();}} }

結果發現，正則在不進行RegexOptions.Compiled 的時候，速度和C regex的基本一樣，在編譯只會，速度會比C regex快上一倍，這不由得讓我對微軟的那群人的敬畏之情油然而生啊。

但隨后我去查看了一下該博客上面C regex的描述，發現我可以再申明正則的時候加入編譯模式，隨后我加入了上面代碼里的 REG_NOSUB（在先前測試的時候是沒有加入的），結果讓我心理面很激動的速度出來了，C regex 匹配速度竟然達到了 300+w/s，也就是比原來的（不加入REG_NOSUB)的代碼快了將近10倍。

之后我變換了匹配的字符串，將其長度生了一倍，達到每個100字符左右（代碼里面所示），匹配速度就下來了，但是也能達到 100w/s左右，這肯定滿足我們現在的需求了。

結果很顯然，當然會選擇C regex了。

總結

以上是生活随笔為你收集整理的C++中三种正则表达式比较（C regex，C ++regex，boost regex）的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： linux里grep和egrep,fgr
下一篇： s3c2440移植MQTT