Python 正则re模块之compile()和findall()详解
首先我們看下官方文檔里關(guān)于的compile的說明:
re.compile(pattern, flags=0) Compile a regular expression pattern into a regular expression object, which can be used for matching using its match() and search() methods, described below.The expression’s behaviour can be modified by specifying a flags value. Values can be any of the following variables, combined using bitwise OR (the | operator). </pre><pre name="code" class="python">The sequence: prog = re.compile(pattern) result = prog.match(string) <strong><span style="font-size:24px;">is equivalent to</span></strong> result = re.match(pattern, string) but using re.compile() and saving the resulting regular expression object for reuse is more efficient when the expression will be used several times in a single program.Note:The compiled versions of the most recent patterns passed to re.compile() and the module-level matching functions are cached, so programs that use only a few regular expressions at a time needn’t worry about compiling regular expressions.下面是flag dotall的說明:
re.DOTALL Make the '.' special character match any character at all, including a newline; without this flag, '.' will match anything except a newline.》》》》》》》》》》》》》》》》》》》》
下面是關(guān)于findall的說明:
re.findall(pattern, string, flags=0) Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.》》》》》》》》》》》》》》》》》》》》
下面舉個(gè)栗子進(jìn)行講解: >>> import re >>> s = "adfad asdfasdf asdfas asdfawef asd adsfas ">>> reObj1 = re.compile('((\w+)\s+\w+)') >>> reObj1.findall(s) [('adfad asdfasdf', 'adfad'), ('asdfas asdfawef', 'asdfas'), ('asd adsfas', 'asd')]>>> reObj2 = re.compile('(\w+)\s+\w+') >>> reObj2.findall(s) ['adfad', 'asdfas', 'asd']>>> reObj3 = re.compile('\w+\s+\w+') >>> reObj3.findall(s) ['adfad asdfasdf', 'asdfas asdfawef', 'asd adsfas']代碼參考下圖進(jìn)行理解:
findall函數(shù)返回的總是正則表達(dá)式在字符串中所有匹配結(jié)果的列表list,此處主要討論列表中“結(jié)果”的展現(xiàn)方式,即findall中返回列表中每個(gè)元素包含的信息。
1.當(dāng)給出的正則表達(dá)式中帶有多個(gè)括號(hào)時(shí),列表的元素為多個(gè)字符串組成的tuple,tuple中字符串個(gè)數(shù)與括號(hào)對(duì)數(shù)相同,字符串內(nèi)容與每個(gè)括號(hào)內(nèi)的正則表達(dá)式相對(duì)應(yīng),并且排放順序是按括號(hào)出現(xiàn)的順序。
2.當(dāng)給出的正則表達(dá)式中帶有一個(gè)括號(hào)時(shí),列表的元素為字符串,此字符串的內(nèi)容與括號(hào)中的正則表達(dá)式相對(duì)應(yīng)(不是整個(gè)正則表達(dá)式的匹配內(nèi)容)。
3.當(dāng)給出的正則表達(dá)式中不帶括號(hào)時(shí),列表的元素為字符串,此字符串為整個(gè)正則表達(dá)式匹配的內(nèi)容。
《《《《《《《《《《《《《《《《《
對(duì)于.re.compile.findall(data)之后的數(shù)據(jù),我們可以通過list的offset索引或者str.join()函數(shù)來使之變成str字符串,從而進(jìn)行方便的處理,下面是python3.5中str.join()的文檔: str.join(iterable) Return a string which is the concatenation of the strings in the iterable iterable. A TypeError will be raised if there are any non-string values in iterable, including bytes objects.The separator between elements is the string providing this method.經(jīng)過上面的介紹,相信對(duì)crawler里的正則有很大的幫助
總結(jié)
以上是生活随笔為你收集整理的Python 正则re模块之compile()和findall()详解的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 编程中常见英语
- 下一篇: websocket python爬虫_p